Context¶

As a Data Scientist at ABC Estate Wines, you have been assigned to analyze and forecast the sales of different types of wine in the 20th century. The dataset comprises sales data from the same company but for different wines.

The objective is to gain insights into historical sales patterns and use this information to make accurate forecasts for future wine sales.

By leveraging analytical techniques and forecasting models, you should provide valuable insights and predictions that can help ABC Estate Wines make informed decisions and effectively manage their wine sales in the coming years.

Importing necessary libraries¶

In [320]:
from statsmodels.tsa.arima.model import ARIMA as ar
from statsmodels.tsa.seasonal import seasonal_decompose
from sklearn.linear_model import LinearRegression
import statsmodels as st
from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt
from statsmodels.tsa.arima.model import ARIMA
import itertools
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf


import numpy as np
import pandas as pd
from sklearn import metrics
import matplotlib.pyplot as plt
#import plotly.offline as py

%matplotlib inline
import seaborn as sns
from pylab import rcParams



from sklearn.metrics import root_mean_squared_error # to resolve TypeError: got an unexpected keyword argument 'squared'

Loading the dataset¶

In [321]:
df = pd.read_csv('/content/Sparkling.csv')

Data Overview¶

Displaying the first few rows of the dataset¶

In [322]:
df.head()
Out[322]:
YearMonth Sparkling
0 1980-01 1686
1 1980-02 1591
2 1980-03 2304
3 1980-04 1712
4 1980-05 1471
In [323]:
df.tail()
Out[323]:
YearMonth Sparkling
182 1995-03 1897
183 1995-04 1862
184 1995-05 1670
185 1995-06 1688
186 1995-07 2031

Checking the shape of the dataset¶

In [324]:
print(f"There are {df.shape[0]} rows and {df.shape[1]} columns.")
There are 187 rows and 2 columns.

Checking the data types of the columns for the dataset¶

In [325]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 187 entries, 0 to 186
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   YearMonth  187 non-null    object
 1   Sparkling  187 non-null    int64 
dtypes: int64(1), object(1)
memory usage: 3.1+ KB

Observations:

  • YearMonth is object, Sparkling is numerical

Data Pre processing¶

In [326]:
# Create a sequence of monthly timestamps to match the number of rows in the dataframe
Time_Stamp = pd.date_range(
    start='1980-01-01',  # The starting date for the time series
    periods=len(df),     # Number of timestamps equals number of rows in df
    freq='M'             # Frequency 'M' means end of every month
)

# Display the generated timestamps
Time_Stamp
Out[326]:
DatetimeIndex(['1980-01-31', '1980-02-29', '1980-03-31', '1980-04-30',
               '1980-05-31', '1980-06-30', '1980-07-31', '1980-08-31',
               '1980-09-30', '1980-10-31',
               ...
               '1994-10-31', '1994-11-30', '1994-12-31', '1995-01-31',
               '1995-02-28', '1995-03-31', '1995-04-30', '1995-05-31',
               '1995-06-30', '1995-07-31'],
              dtype='datetime64[ns]', length=187, freq='ME')
In [327]:
# Add the generated Time_Stamp as a new column in the dataframe
df['Time_Stamp'] = Time_Stamp

# Display the first 5 rows of the dataframe to verify the new column
df.head()
Out[327]:
YearMonth Sparkling Time_Stamp
0 1980-01 1686 1980-01-31
1 1980-02 1591 1980-02-29
2 1980-03 2304 1980-03-31
3 1980-04 1712 1980-04-30
4 1980-05 1471 1980-05-31
In [328]:
# Set the 'Time_Stamp' column as the dataframe index
# 'inplace=True' means the change will directly modify the existing dataframe (no need to reassign)
df.set_index(keys='Time_Stamp', inplace=True)

# Display the dataframe to confirm that Time_Stamp is now the index
df
Out[328]:
YearMonth Sparkling
Time_Stamp
1980-01-31 1980-01 1686
1980-02-29 1980-02 1591
1980-03-31 1980-03 2304
1980-04-30 1980-04 1712
1980-05-31 1980-05 1471
... ... ...
1995-03-31 1995-03 1897
1995-04-30 1995-04 1862
1995-05-31 1995-05 1670
1995-06-30 1995-06 1688
1995-07-31 1995-07 2031

187 rows × 2 columns

In [329]:
# axis=1 → refers to dropping a column
# inplace=True → applies the change directly to df

df.drop(['YearMonth'], axis=1, inplace=True)  # Drop the 'YearMonth' column from the dataframe
In [330]:
print(df.shape)
(187, 1)
In [331]:
# Confirming the if it YearMonth was successfully dropped
df.head()
Out[331]:
Sparkling
Time_Stamp
1980-01-31 1686
1980-02-29 1591
1980-03-31 2304
1980-04-30 1712
1980-05-31 1471

Checking for missing values¶

In [332]:
# Check the dataframe for missing values
# This returns True for each cell that contains a missing value, otherwise False
df.isnull()
Out[332]:
Sparkling
Time_Stamp
1980-01-31 False
1980-02-29 False
1980-03-31 False
1980-04-30 False
1980-05-31 False
... ...
1995-03-31 False
1995-04-30 False
1995-05-31 False
1995-06-30 False
1995-07-31 False

187 rows × 1 columns

In [333]:
# Making sure that there are no missing values by checking the missing count. Count total of missing values per column
df.isnull().sum()
Out[333]:
0
Sparkling 0

In [334]:
df.head()
Out[334]:
Sparkling
Time_Stamp
1980-01-31 1686
1980-02-29 1591
1980-03-31 2304
1980-04-30 1712
1980-05-31 1471

Exploratory Data Analysis (EDA)¶

Univariate Analysis¶

In [335]:
plt.figure(figsize=(12,6))
# Set the figure size for better readability

plt.hist(df['Sparkling'], bins=20, color='skyblue', edgecolor='black')
# Plot the distribution of Sparkling Wine sales
# bins=20 → number of bars (distribution granularity)
# skyblue bars with black edges for contrast

plt.title("Distribution of Sparkling Wine Sales", fontsize=16)
# Title describing the chart

plt.xlabel("Number of Sales", fontsize=12)
# X-axis shows numeric sales values (univariate focus)

plt.ylabel("Frequency", fontsize=12)
# Y-axis shows how often each sales value occurred

plt.grid(axis='y', linestyle='--', alpha=0.5)
# Light horizontal grid to improve readability
# Applied only to Y-axis for a clean look

plt.tight_layout()
# Adjust layout to prevent label overlap

plt.show()
# Display the final plot
No description has been provided for this image
In [336]:
plt.figure(figsize=(8,5))
sns.boxplot(y=df['Sparkling'], color='lightcoral')
plt.title("Boxplot of Sparkling Wine Sales", fontsize=16)
plt.ylabel("Number of Sales", fontsize=12)
plt.grid(axis='y', linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()
No description has been provided for this image

Observations¶

  • The histogram shows that most of the sales of Sparkling Wine occur between 1,400 and 2,200 units. Fewer months have much higher sales, reflecting a right-skewed distribution with some periods of high demand.

  • The boxplot shows that generally, the sales of Sparkling Wine range between 1,500 and 2,400 units with a median of around 1,900. The high value outliers are quite numerous, indicating periodic strong seasonal peaks in demand.

Bivariate Analysis¶

TimeStamp vs Sparking Sales¶
In [337]:
plt.figure(figsize=(12, 6))  # Set the figure size for better visibility

plt.plot(df, linewidth=2)    # Plot the dataframe values with a thicker line
plt.title('Sparkling Sales Over Time', fontsize=14)  # Add a title to the plot
plt.xlabel('Time', fontsize=12)  # Label the x-axis as Time
plt.ylabel('Sales', fontsize=12)  # Label the y-axis as Sales
plt.grid(True)  # Add a grid to make the chart easier to read

plt.show()  # Display the plot
No description has been provided for this image
In [338]:
# Group data by year and calculate yearly total/mean sales
yearly_sales = df['Sparkling'].groupby(df.index.year).sum()

plt.figure(figsize=(12, 6))  # Bigger chart for clear visibility

# Choose rainbow colormap and plot
colors = plt.cm.rainbow(range(len(yearly_sales)))  # Rainbow gradient colors

plt.bar(yearly_sales.index, yearly_sales.values, color=colors)

# Labels and title
plt.title('Yearly Sparkling Wine Sales Over Years', fontsize=14)
plt.xlabel('Year', fontsize=12)
plt.ylabel('Sales', fontsize=12)

# Display values on bars for better insight
for i, v in enumerate(yearly_sales.values):
    plt.text(yearly_sales.index[i], v, str(round(v, 2)),
             ha='center', va='bottom', fontsize=9)

plt.grid(axis='y', linestyle='--', alpha=0.6)  # Subtle grid for readability
plt.show()
No description has been provided for this image
Sparkling Sales by Year¶
In [339]:
# x = df.index.year extracts the year from the DateTime index
# y = df['Sparkling'] selects the Sparkling sales column to plot
sns.boxplot(x=df.index.year, y=df['Sparkling'])

# Add labels and title for better readability
plt.xlabel('Year', fontsize=12)
plt.ylabel('Sparkling Sales', fontsize=12)
plt.title('Yearly Distribution of Sparkling Sales', fontsize=14)

# Rotate x-axis labels and optionally reduce number of ticks
plt.xticks(rotation=45)  # Rotate year labels so they don’t overlap
plt.grid(True)  # Add grid to make the visualization cleaner

# Show the plot
plt.show()
No description has been provided for this image
Sparkling Sales by Month¶
In [340]:
# x = df.index.month_name() extracts month names from the DateTime index
# y = df['Sparkling'] selects the Sparkling sales column to plot
sns.boxplot(x=df.index.month_name(), y=df['Sparkling'])

# Add labels and title for better understanding
plt.xlabel('Month', fontsize=12)  # Label for x-axis
plt.ylabel('Sparkling Sales', fontsize=12)  # Label for y-axis
plt.title('Monthly Distribution of Sparkling Sales', fontsize=14)  # Plot title

# Rotate x-axis labels to avoid clutter since there are 12 long month names
plt.xticks(rotation=45, ha='right')  # ha='right' aligns labels nicely

plt.grid(True)  # Add grid for improved readability

# Display the plot
plt.show()
No description has been provided for this image
Monthly Sales Across Years¶
In [341]:
monthly_sales_across_years = pd.pivot_table(df, values = 'Sparkling', columns = df.index.month_name(), index = df.index.year)
monthly_sales_across_years
Out[341]:
Time_Stamp April August December February January July June March May November October September
Time_Stamp
1980 1712.0 2453.0 5179.0 1591.0 1686.0 1966.0 1377.0 2304.0 1471.0 4087.0 2596.0 1984.0
1981 1976.0 2472.0 4551.0 1523.0 1530.0 1781.0 1480.0 1633.0 1170.0 3857.0 2273.0 1981.0
1982 1790.0 1897.0 4524.0 1329.0 1510.0 1954.0 1449.0 1518.0 1537.0 3593.0 2514.0 1706.0
1983 1375.0 2298.0 4923.0 1638.0 1609.0 1600.0 1245.0 2030.0 1320.0 3440.0 2511.0 2191.0
1984 1789.0 3159.0 5274.0 1435.0 1609.0 1597.0 1404.0 2061.0 1567.0 4273.0 2504.0 1759.0
1985 1589.0 2512.0 5434.0 1682.0 1771.0 1645.0 1379.0 1846.0 1896.0 4388.0 3727.0 1771.0
1986 1605.0 3318.0 5891.0 1523.0 1606.0 2584.0 1403.0 1577.0 1765.0 3987.0 2349.0 1562.0
1987 1935.0 1930.0 7242.0 1442.0 1389.0 1847.0 1250.0 1548.0 1518.0 4405.0 3114.0 2638.0
1988 2336.0 1645.0 6757.0 1779.0 1853.0 2230.0 1661.0 2108.0 1728.0 4988.0 3740.0 2421.0
1989 1650.0 1968.0 6694.0 1394.0 1757.0 1971.0 1406.0 1982.0 1654.0 4514.0 3845.0 2608.0
1990 1628.0 1605.0 6047.0 1321.0 1720.0 1899.0 1457.0 1859.0 1615.0 4286.0 3116.0 2424.0
1991 1279.0 1857.0 6153.0 2049.0 1902.0 2214.0 1540.0 1874.0 1432.0 3627.0 3252.0 2408.0
1992 1997.0 1773.0 6119.0 1667.0 1577.0 2076.0 1625.0 1993.0 1783.0 4096.0 3088.0 2377.0
1993 2121.0 2795.0 6410.0 1564.0 1494.0 2048.0 1515.0 1898.0 1831.0 4227.0 3339.0 1749.0
1994 1725.0 1495.0 5999.0 1968.0 1197.0 2031.0 1693.0 1720.0 1674.0 3729.0 3385.0 2968.0
1995 1862.0 NaN NaN 1402.0 1070.0 2031.0 1688.0 1897.0 1670.0 NaN NaN NaN
In [342]:
monthly_sales_across_years.plot(figsize=(20,10))
plt.grid()
plt.legend(loc='best');
No description has been provided for this image

Observations¶

  • In the sales of Sparkling Wine, a clear yearly seasonal pattern can be seen, with sharp peaks and an upward trend across the years.

  • Sales generally rose over the years, peaking in about 1988 with the highest total, which confirms long-term demand growth.

  • Most years show stable sales levels, while several years contain high outliers, indicating seasonal surges in demand such as holiday periods.

  • Sales are always highest in December, confirming strong seasonality, and lowest between May and August.

  • December exhibits the most powerful growth throughout the years, while the other months are quite stable with gradual increases, reinforcing a strong seasonal effect at year-end.

Decomposition¶

Additive¶

In [343]:
# Additive Decomposition
# This splits the time series into Trend, Seasonal, and Residual components
decomposition = seasonal_decompose(df, model='additive')

# Plot the decomposed components (Observed, Trend, Seasonal, Residual)
plt.figure(figsize=(12, 8))  # Increase figure size for better readability
decomposition.plot()
plt.suptitle('Additive Seasonal Decomposition of Sparkling Sales', fontsize=16)  # Add a title to the entire figure
plt.tight_layout()  # Adjust layout to avoid overlapping elements
plt.show()  # Display the full decomposition plot
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image
In [344]:
# Extract decomposition components
trend = decomposition.trend          # Long-term pattern
seasonality = decomposition.seasonal # Recurring seasonal effects
residual = decomposition.resid       # Noise/irregular component

# Display a preview of the first 12 values from each component
print("🔹 Trend Component (first 12 values):")
print(trend.head(12))
print("\n" + "-"*50 + "\n")  # Add separator for readability

print("🔹 Seasonal Component (first 12 values):")
print(seasonality.head(12))
print("\n" + "-"*50 + "\n")

print("🔹 Residual Component (first 12 values):")
print(residual.head(12))
print("\n" + "-"*50)

# Check for missing values in each component (important before modeling)
print("\n✅ Missing Values Count:")
print("Trend:", trend.isna().sum())
print("Seasonality:", seasonality.isna().sum())
print("Residual:", residual.isna().sum())
🔹 Trend Component (first 12 values):
Time_Stamp
1980-01-31            NaN
1980-02-29            NaN
1980-03-31            NaN
1980-04-30            NaN
1980-05-31            NaN
1980-06-30            NaN
1980-07-31    2360.666667
1980-08-31    2351.333333
1980-09-30    2320.541667
1980-10-31    2303.583333
1980-11-30    2302.041667
1980-12-31    2293.791667
Name: trend, dtype: float64

--------------------------------------------------

🔹 Seasonal Component (first 12 values):
Time_Stamp
1980-01-31    -854.260599
1980-02-29    -830.350678
1980-03-31    -592.356630
1980-04-30    -658.490559
1980-05-31    -824.416154
1980-06-30    -967.434011
1980-07-31    -465.502265
1980-08-31    -214.332821
1980-09-30    -254.677265
1980-10-31     599.769957
1980-11-30    1675.067179
1980-12-31    3386.983846
Name: seasonal, dtype: float64

--------------------------------------------------

🔹 Residual Component (first 12 values):
Time_Stamp
1980-01-31           NaN
1980-02-29           NaN
1980-03-31           NaN
1980-04-30           NaN
1980-05-31           NaN
1980-06-30           NaN
1980-07-31     70.835599
1980-08-31    315.999487
1980-09-30    -81.864401
1980-10-31   -307.353290
1980-11-30    109.891154
1980-12-31   -501.775513
Name: resid, dtype: float64

--------------------------------------------------

✅ Missing Values Count:
Trend: 12
Seasonality: 0
Residual: 12
Checking for additive assumptions¶
Residual Component ( Additive )¶
In [345]:
# Calculate the mean of the residual component
# If the model decomposition is correct, the residual mean should be close to zero
residual_mean = residual.mean()

# Print the residual mean value
print("Average of Residual Component:", residual_mean)
Average of Residual Component: -1.2088458994707376
In [346]:
# Normality distribution check of the residual component
from scipy.stats import shapiro

# Plot histogram + KDE curve to visually assess normality
plt.figure(figsize=(10,5))  # Bigger for clearer view
sns.histplot(residual.dropna(), kde=True)  # dropna() to remove missing values

# Add title and axis labels for better readability
plt.title('Additive Residuals Distribution (Normality Check)')
plt.xlabel('Residual Values')
plt.ylabel('Frequency')
plt.grid(True)  # Add grid for clarity

plt.show()
No description has been provided for this image
Shapiro-Wilk test for Normality on the Residuals ( Additive )¶
In [347]:
# dropna() removes any missing values that can cause errors in the test
stat, p_value = shapiro(residual.dropna())

# Print the statistical results
print("Shapiro-Wilk Test Statistic:", stat)
print("p-value:", p_value)

# Interpretation of the p-value
# If p-value > 0.05 → residuals are likely normally distributed (good for forecasting models)
# If p-value <= 0.05 → residuals deviate from normality (model assumptions may need revisiting)
if p_value > 0.05:
    print("✅ Residuals appear to follow a normal distribution.")
else:
    print("⚠ Residuals are NOT normally distributed.")
Shapiro-Wilk Test Statistic: 0.9833056372432126
p-value: 0.03433683203800602
⚠ Residuals are NOT normally distributed.

Multiplicative¶

In [348]:
# Multiplicative Decomposition
# Use this when seasonal fluctuations grow/shrink with the level of the series
decomposition = seasonal_decompose(df, model='multiplicative')  # complete the code to multiplicative decomposition

# Plot the observed, trend, seasonal, and residual components
decomposition.plot()
plt.suptitle('Multiplicative Seasonal Decomposition of Sparkling Sales', fontsize=14)
plt.tight_layout()
plt.show()
No description has been provided for this image
In [349]:
# Extract components from the multiplicative decomposition
trend = decomposition.trend           # Long-term movement in the data
seasonality = decomposition.seasonal  # Seasonal repeating pattern over time
residual = decomposition.resid        # Noise/irregular component after removing trend and seasonality

# Display the first 12 entries for each component for inspection
print("🔹 Trend Component (first 12 values):")
print(trend.head(12))  # Check how the trend evolves over early months
print("\n" + "-"*50 + "\n")  # Visual separator

print("🔹 Seasonal Component (first 12 values):")
print(seasonality.head(12))  # Helps understand monthly seasonal effect
print("\n" + "-"*50 + "\n")

print("🔹 Residual Component (first 12 values):")
print(residual.head(12))  # Should ideally look like random noise
print("\n" + "-"*50)

# Check for missing values in each component (common at the boundaries)
print("\n✅ Missing Values Count:")
print("Trend:", trend.isna().sum())
print("Seasonality:", seasonality.isna().sum())
print("Residual:", residual.isna().sum())
🔹 Trend Component (first 12 values):
Time_Stamp
1980-01-31            NaN
1980-02-29            NaN
1980-03-31            NaN
1980-04-30            NaN
1980-05-31            NaN
1980-06-30            NaN
1980-07-31    2360.666667
1980-08-31    2351.333333
1980-09-30    2320.541667
1980-10-31    2303.583333
1980-11-30    2302.041667
1980-12-31    2293.791667
Name: trend, dtype: float64

--------------------------------------------------

🔹 Seasonal Component (first 12 values):
Time_Stamp
1980-01-31    0.649843
1980-02-29    0.659214
1980-03-31    0.757440
1980-04-30    0.730351
1980-05-31    0.660609
1980-06-30    0.603468
1980-07-31    0.809164
1980-08-31    0.918822
1980-09-30    0.894367
1980-10-31    1.241789
1980-11-30    1.690158
1980-12-31    2.384776
Name: seasonal, dtype: float64

--------------------------------------------------

🔹 Residual Component (first 12 values):
Time_Stamp
1980-01-31         NaN
1980-02-29         NaN
1980-03-31         NaN
1980-04-30         NaN
1980-05-31         NaN
1980-06-30         NaN
1980-07-31    1.029230
1980-08-31    1.135407
1980-09-30    0.955954
1980-10-31    0.907513
1980-11-30    1.050423
1980-12-31    0.946770
Name: resid, dtype: float64

--------------------------------------------------

✅ Missing Values Count:
Trend: 12
Seasonality: 0
Residual: 12
Checking for Multiplicative assumptions¶
Residual Component ( Multiplicative )¶
In [350]:
# Extract the residual (error) component from the decomposition
residual = decomposition.resid

# Calculate the mean value of the residuals
# If decomposition separated trend and seasonality properly, this mean should be very close to zero
residual_mean = residual.mean()

# Print the mean of the residuals
print("Mean of Residual Component:", residual_mean)
Mean of Residual Component: 0.9997456359115032
In [351]:
# Plot the distribution of the residual values to visually check if they follow a normal distribution
plt.figure(figsize=(10, 5))  # Increase figure size to improve readability

sns.histplot(residual.dropna(), kde=True)
# dropna() removes missing values
# kde=True overlays a smooth probability curve to visualize the shape of distribution

# Add plot title and axis labels for explanation
plt.title('Multiplicative Residuals Distribution (Normality Check)', fontsize=14)
plt.xlabel('Residual Values', fontsize=12)
plt.ylabel('Frequency', fontsize=12)

plt.grid(True)  # Add a grid for easier interpretation
plt.show()  # Display the plot
No description has been provided for this image
Shapiro-Wilk test for Normality on the Residuals ( Multiplicative )¶
In [352]:
# Perform the Shapiro-Wilk test to check if residuals follow a normal distribution under Multiplicative
# dropna() removes missing values that can cause errors in the test
stat, p_value = shapiro(residual.dropna())

# Display the test results
print("Shapiro-Wilk Test Statistic:", stat)
print("p-value:", p_value)

# Interpretation of the p-value result
# If p-value > 0.05 → residuals are likely normally distributed (good for forecasting models)
# If p-value <= 0.05 → residuals deviate from normality (assumptions may need review)
if p_value > 0.05:
    print("✅ Residuals appear to follow a normal distribution.")
else:
    print("⚠ Residuals are NOT normally distributed.")
Shapiro-Wilk Test Statistic: 0.9859993833004654
p-value: 0.07803310394786477
✅ Residuals appear to follow a normal distribution.
In [353]:
df.describe()
Out[353]:
Sparkling
count 187.000000
mean 2402.417112
std 1295.111540
min 1070.000000
25% 1605.000000
50% 1874.000000
75% 2549.000000
max 7242.000000

Observations¶

Additive

  • There is a clear seasonal pattern every year, with an increasing trend over time and fluctuations captured well in the seasonal component.

  • Residuals show a rather centered distribution but with evident skewness and heavy tails.

  • p < 0.05, The residuals are not normally distributed therefore, the additive model does not adequately describe variability.

Multiplicative

  • Peaks in seasons expand proportionally with the trend, reflecting that the seasonal effects rise as the overall sales increase for which multiplicative seasonality is more suitable.

  • Residuals are more symmetric and closer to a bell-shaped distribution than those from the additive model.

  • p-value > 0.05, Residuals are approximately normally distributed, meaning the multiplicative model fits better.

Conclusion:

  • Based on the tests for decomposition and residual normality, the multiplicative model fits better for Sparkling Wine sales since seasonal variations grow with the increase in the overall level of sales.

Data Preparation for Modeling¶

In [354]:
# Year
df.index.year.unique()
Out[354]:
Index([1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991,
       1992, 1993, 1994, 1995],
      dtype='int32', name='Time_Stamp')
In [355]:
# Month
df.index.month.unique()
Out[355]:
Index([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], dtype='int32', name='Time_Stamp')

Train / Test Split¶

In [356]:
df_train = df[df.index <= "1991-12-31"]  # Training data up to end of 1991
df_test = df[df.index > "1991-12-31"]   # Testing data from 1992 onward
In [357]:
# Print the shape (rows, columns) of the training set
# This helps verify how much data was assigned to the training portion
print("Training Set Shape:", df_train.shape)

# Print the shape (rows, columns) of the testing set
# This ensures the test period contains data after 1991
print("Testing Set Shape:", df_test.shape)
Training Set Shape: (144, 1)
Testing Set Shape: (43, 1)

Train Dataset after Splitting¶

In [358]:
df_train.head()
Out[358]:
Sparkling
Time_Stamp
1980-01-31 1686
1980-02-29 1591
1980-03-31 2304
1980-04-30 1712
1980-05-31 1471

Test Dataset after Splitting¶

In [359]:
df_test.head()
Out[359]:
Sparkling
Time_Stamp
1992-01-31 1577
1992-02-29 1667
1992-03-31 1993
1992-04-30 1997
1992-05-31 1783

Model Building¶

Linear Regression Model¶

In [360]:
# Create a sequence of time instances for the training dataset
# Adding +1 so the sequence starts from 1
train_time = [i + 1 for i in range(len(df_train))]

# Create a sequence of time instances for the test dataset
# Adding +133 so the test time follows the training period sequentially
# (133 should match the length of df_train to avoid time gaps)
test_time = [i + len(df_train) + 1 for i in range(len(df_test))]

# Display the generated time index ranges
print("Training Time Instances:", train_time[:10], "...")  # Print first 10 values only for cleaner output
print("Total Training Time Points:", len(train_time))

print("\nTest Time Instances:", test_time[:10], "...")  # Print first 10 values for preview
print("Total Test Time Points:", len(test_time))
Training Time Instances: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] ...
Total Training Time Points: 144

Test Time Instances: [145, 146, 147, 148, 149, 150, 151, 152, 153, 154] ...
Total Test Time Points: 43
In [361]:
# Create copies of the train and test dataframes
# This ensures that any modifications we make for Linear Regression modeling
# (like adding features or transformations) do NOT affect the original data
LinearRegression_train = df_train.copy()
LinearRegression_test = df_test.copy()

# Confirm that separate datasets were successfully created
print("Training DataFrame for Linear Regression:", LinearRegression_train.shape)
print("Testing DataFrame for Linear Regression:", LinearRegression_test.shape)
Training DataFrame for Linear Regression: (144, 1)
Testing DataFrame for Linear Regression: (43, 1)
In [362]:
# Add the numeric time index as a feature in the training and test dataframes
# This will be the independent variable for Linear Regression
LinearRegression_train['time'] = train_time
LinearRegression_test['time'] = test_time

# Display first few rows of training data to confirm the new 'time' column was added correctly
print("📌 First few rows of Training Data:")
print(LinearRegression_train.head())
print("\n" + "-"*60 + "\n")  # Separator

# Display last few rows of training data to ensure correct alignment of time feature
print("📌 Last few rows of Training Data:")
print(LinearRegression_train.tail())
print("\n" + "-"*60 + "\n")

# Display first few rows of test data to verify correct continuation of time feature
print("📌 First few rows of Test Data:")
print(LinearRegression_test.head())
print("\n" + "-"*60 + "\n")

# Display last few rows of test data
print("📌 Last few rows of Test Data:")
print(LinearRegression_test.tail())
📌 First few rows of Training Data:
            Sparkling  time
Time_Stamp                 
1980-01-31       1686     1
1980-02-29       1591     2
1980-03-31       2304     3
1980-04-30       1712     4
1980-05-31       1471     5

------------------------------------------------------------

📌 Last few rows of Training Data:
            Sparkling  time
Time_Stamp                 
1991-08-31       1857   140
1991-09-30       2408   141
1991-10-31       3252   142
1991-11-30       3627   143
1991-12-31       6153   144

------------------------------------------------------------

📌 First few rows of Test Data:
            Sparkling  time
Time_Stamp                 
1992-01-31       1577   145
1992-02-29       1667   146
1992-03-31       1993   147
1992-04-30       1997   148
1992-05-31       1783   149

------------------------------------------------------------

📌 Last few rows of Test Data:
            Sparkling  time
Time_Stamp                 
1995-03-31       1897   183
1995-04-30       1862   184
1995-05-31       1670   185
1995-06-30       1688   186
1995-07-31       2031   187
In [363]:
# This will learn the linear relationship between Time_Stamp and Sparkling sales
lr = LinearRegression()

print("Linear Regression Model Initialized ✅")
Linear Regression Model Initialized ✅
In [364]:
LinearRegression_train['Sparkling'].values
Out[364]:
array([1686, 1591, 2304, 1712, 1471, 1377, 1966, 2453, 1984, 2596, 4087,
       5179, 1530, 1523, 1633, 1976, 1170, 1480, 1781, 2472, 1981, 2273,
       3857, 4551, 1510, 1329, 1518, 1790, 1537, 1449, 1954, 1897, 1706,
       2514, 3593, 4524, 1609, 1638, 2030, 1375, 1320, 1245, 1600, 2298,
       2191, 2511, 3440, 4923, 1609, 1435, 2061, 1789, 1567, 1404, 1597,
       3159, 1759, 2504, 4273, 5274, 1771, 1682, 1846, 1589, 1896, 1379,
       1645, 2512, 1771, 3727, 4388, 5434, 1606, 1523, 1577, 1605, 1765,
       1403, 2584, 3318, 1562, 2349, 3987, 5891, 1389, 1442, 1548, 1935,
       1518, 1250, 1847, 1930, 2638, 3114, 4405, 7242, 1853, 1779, 2108,
       2336, 1728, 1661, 2230, 1645, 2421, 3740, 4988, 6757, 1757, 1394,
       1982, 1650, 1654, 1406, 1971, 1968, 2608, 3845, 4514, 6694, 1720,
       1321, 1859, 1628, 1615, 1457, 1899, 1605, 2424, 3116, 4286, 6047,
       1902, 2049, 1874, 1279, 1432, 1540, 2214, 1857, 2408, 3252, 3627,
       6153])
In [365]:
# Train the Linear Regression model using the 'time' feature as the predictor
# 'time' (independent variable) → numerical representation of time progression
# 'Sparkling' (dependent variable) → target sales values we want to predict
lr.fit(LinearRegression_train[['time']], LinearRegression_train['Sparkling'].values)

# Display confirmation message
print("✅ Linear Regression model has been successfully trained.")
✅ Linear Regression model has been successfully trained.
In [366]:
# Use the trained model to make predictions on the test dataset
test_prediction_model = lr.predict(LinearRegression_test[['time']])

# Store the predictions in a new column for comparison
LinearRegression_test['RegOnTime'] = test_prediction_model

# Plot actual vs predicted values to visually evaluate performance
plt.figure(figsize=(14, 7))  # Larger figure for clear visibility

# Plot training data values
plt.plot(df_train['Sparkling'], label='Training Data', linewidth=2)

# Plot actual testing data values
plt.plot(df_test['Sparkling'], label='Actual Test Data', linewidth=2)

# Plot predicted values from Linear Regression model
plt.plot(LinearRegression_test['RegOnTime'],
         label='Predicted Test Data (Linear Regression)',
         linestyle='--', linewidth=2)

# Add title and labels
plt.title('Linear Regression Forecast vs Actual Sparkling Sales', fontsize=16)
plt.xlabel('Time', fontsize=12)
plt.ylabel('Sparkling Sales', fontsize=12)

# Add legend and grid for better readability
plt.legend(loc='best')
plt.grid(True, linestyle='--', alpha=0.6)

plt.tight_layout()  # Ensure no labels are cut off
plt.show()  # Display the plot
No description has been provided for this image
In [367]:
# Calculate Root Mean Squared Error (RMSE) for model predictions on the test set
# squared=False ensures the metric returned is RMSE (not MSE)
rmse_model_test = metrics.mean_squared_error(
    df_test['Sparkling'],          # Actual values
    test_prediction_model)       # Predicted values
    # squared=False   # I'm getting an error here. TypeError: got an unexpected keyword argument 'squared'


# Display the calculated RMSE in a well-formatted print statement
print(f"✅ For Linear Regression (Time-Based) forecast on Test Data, RMSE ≈ {rmse_model_test:.3f}")
✅ For Linear Regression (Time-Based) forecast on Test Data, RMSE ≈ 1840430.137
In [368]:
# Create a DataFrame to store and display model performance results
# This allows easy comparison later when multiple models (like ARIMA, SARIMA, Prophet) are added
resultsDf = pd.DataFrame(
    {'Test RMSE': [rmse_model_test]},  # Store RMSE value
    index=['Regression On Time Model']                     # Label row with model name
)

# Display the performance summary table
print("📊 Model Performance Summary:")
resultsDf
📊 Model Performance Summary:
Out[368]:
Test RMSE
Regression On Time Model 1.840430e+06

Observations:

  • The Linear Regression model does not capture the strong seasonal peaks in the sales of Sparkling Wine.

  • Its predictions form a nearly straight increasing line, resulting in a very high RMSE value.

  • This indicates that Linear Regression is not suitable for the modeling of this highly seasonal time series.

Naive Approach Model¶

In [369]:
# Create copies of the original train and test datasets
# These copies will be used to implement the Naive Forecasting model
NaiveModel_train = df_train.copy()
NaiveModel_test = df_test.copy()

print("✅ Copies created for Naive Model:")
print("Train set shape:", NaiveModel_train.shape)
print("Test set shape:", NaiveModel_test.shape)
✅ Copies created for Naive Model:
Train set shape: (144, 1)
Test set shape: (43, 1)
In [370]:
NaiveModel_train.shape
Out[370]:
(144, 1)
In [371]:
# Confirm that the copies were successfully created
print("✅ Copies created for Naive Model:")
print("Training set shape:", NaiveModel_train.shape)
print("Testing set shape:", NaiveModel_test.shape)
✅ Copies created for Naive Model:
Training set shape: (144, 1)
Testing set shape: (43, 1)
In [372]:
# Display the first few rows of the Naive Model test dataset
# This helps verify that the copied dataset structure looks correct
print("📌 First few rows of Naive Test Data:")
NaiveModel_test.head()
📌 First few rows of Naive Test Data:
Out[372]:
Sparkling
Time_Stamp
1992-01-31 1577
1992-02-29 1667
1992-03-31 1993
1992-04-30 1997
1992-05-31 1783
In [373]:
# Naive Forecast:
# The prediction for every value in the test set is simply
# the *last observed value* from the training period

# Extract the last observed training value
last_train_value = df_train['Sparkling'].iloc[-1]

# Assign the same value to the entire test set as the naive forecast
NaiveModel_test['naive'] = last_train_value

# Show the first few predictions to confirm
print("📌 Naive Forecast - First few predictions:")
NaiveModel_test['naive'].head()
📌 Naive Forecast - First few predictions:
Out[373]:
naive
Time_Stamp
1992-01-31 6153
1992-02-29 6153
1992-03-31 6153
1992-04-30 6153
1992-05-31 6153

In [374]:
# Display the first few rows of the test dataset including Naive forecasts
print("📌 First few rows of NaiveModel_test (with naive predictions):")
NaiveModel_test.head()
📌 First few rows of NaiveModel_test (with naive predictions):
Out[374]:
Sparkling naive
Time_Stamp
1992-01-31 1577 6153
1992-02-29 1667 6153
1992-03-31 1993 6153
1992-04-30 1997 6153
1992-05-31 1783 6153
In [375]:
# Visualizing the Naive Forecast model performance
plt.figure(figsize=(14, 7))  # Larger size for better visibility

# Plot training actual sales
plt.plot(NaiveModel_train['Sparkling'],
         label='Training Data', linewidth=2)

# Plot actual test sales
plt.plot(df_test['Sparkling'],
         label='Actual Test Data', linewidth=2)

# Plot naive predictions on test data
plt.plot(NaiveModel_test['naive'],
         label='Naive Forecast (Test Data)', linestyle='--', linewidth=2)

# Adding title and axis labels for clarity
plt.title('Naive Forecast vs Actual Sparkling Sales', fontsize=16)
plt.xlabel('Time', fontsize=12)
plt.ylabel('Sales', fontsize=12)

# Add legend in best location & grid for better readability
plt.legend(loc='best')
plt.grid(True, linestyle='--', alpha=0.5)

plt.tight_layout()
plt.show()
No description has been provided for this image
In [376]:
# Model Evaluation for Naive Forecast
# squared=False converts MSE → RMSE (Root Mean Squared Error)
rmse_model_test = metrics.mean_squared_error(
    df_test['Sparkling'],          # Actual test values
    NaiveModel_test['naive'])    # Naive forecast values

# False=True Keyword error

# Display the error score in a clean and formatted way
print(f"✅ Naive Forecast RMSE on Test Data: {rmse_model_test:.3f}")
✅ Naive Forecast RMSE on Test Data: 15839720.953
In [377]:
resultsDfN = pd.DataFrame({'Test RMSE': [rmse_model_test]}, index=['NaiveForecast'])
resultsDf = pd.concat([resultsDf, resultsDfN])

# Display the updated performance results
print("📊 Updated Model Performance Table:")
resultsDf
📊 Updated Model Performance Table:
Out[377]:
Test RMSE
Regression On Time Model 1.840430e+06
NaiveForecast 1.583972e+07

Simple Average Model¶

In [378]:
SimpleAverage_train = df_train.copy()
SimpleAverage_test = df_test.copy()

# Confirm the copies are successfully created
print("Training set shape:", SimpleAverage_train.shape)
print("Testing set shape:", SimpleAverage_test.shape)
Training set shape: (144, 1)
Testing set shape: (43, 1)
In [379]:
# Simple Average Forecast Model
# Every forecasted value = average of the training data Sparkling Sales

# Compute the mean sales from the training dataset
SimpleAverage_test['mean forecast']= df_train['Sparkling'].mean()

# Display the first few rows to confirm the forecast column is added correctly
print("📌 First few rows of the Simple Average Test Data:")
SimpleAverage_test.head()
📌 First few rows of the Simple Average Test Data:
Out[379]:
Sparkling mean forecast
Time_Stamp
1992-01-31 1577 2408.930556
1992-02-29 1667 2408.930556
1992-03-31 1993 2408.930556
1992-04-30 1997 2408.930556
1992-05-31 1783 2408.930556
In [380]:
# Plot the Simple Average Forecast model results
plt.figure(figsize=(14, 7))  # Bigger size for better visualization

# Plot actual training data
plt.plot(df_train['Sparkling'],
         label='Training Data',
         linewidth=2)

# Plot actual test data
plt.plot(df_test['Sparkling'],
         label='Actual Test Data',
         linewidth=2)

# Plot predicted values from Simple Average model
plt.plot(SimpleAverage_test['mean forecast'],
         label='Simple Average Forecast (Test Data)',
         linestyle='--', linewidth=2)

# Title and axis labels
plt.title("Simple Average Forecast vs Actual Sparkling Sales", fontsize=16)
plt.xlabel("Time", fontsize=12)
plt.ylabel("Sales", fontsize=12)

# Add legend and background grid for clarity
plt.legend(loc='best')
plt.grid(True, linestyle='--', alpha=0.5)

plt.tight_layout()  # Avoid layout cut-offs
plt.show()
No description has been provided for this image
In [381]:
# Model Evaluation for Simple Average Forecast
# squared=False converts MSE → RMSE (Root Mean Squared Error)
rmse_model_test = root_mean_squared_error(
    df_test['Sparkling'],                 # Actual test values
    SimpleAverage_test['mean forecast']  # Forecasted values
)

# Print RMSE result in a clean formatted way
print(f"✅ RMSE for Simple Average Forecast on Test Data: {rmse_model_test:.3f}")
✅ RMSE for Simple Average Forecast on Test Data: 1268.683
In [382]:
# Create a performance entry for the Simple Average model
resultsDfSES = pd.DataFrame(
    {'Test RMSE': [rmse_model_test]},
    index=['SimpleAverageModel']  # ✅ Completed model name
)

# Append it to the main model comparison table
resultsDf = pd.concat([resultsDf, resultsDfSES])

# Display the updated results
print("📊 Updated Model Performance Table:")
display(resultsDf)
📊 Updated Model Performance Table:
Test RMSE
Regression On Time Model 1.840430e+06
NaiveForecast 1.583972e+07
SimpleAverageModel 1.268683e+03

Simple Exponential Smoothing Model¶

In [383]:
# Train and test dataset for Simple Exponential Smoothing (SES) model
SES_train = df_train.copy()
SES_test = df_test.copy()

print("✅ SES datasets created:")
print("Train shape:", SES_train.shape)
print("Test shape:", SES_test.shape)
✅ SES datasets created:
Train shape: (144, 1)
Test shape: (43, 1)
In [384]:
# Initialize the Simple Exponential Smoothing (SES) model
# Using the Sparkling sales from the training dataset as the time series input
model_SES = SimpleExpSmoothing(
    SES_train['Sparkling'].astype('float')  # Ensure numeric type for the model
)

print("✅ SES model initialized successfully")
✅ SES model initialized successfully
In [385]:
# Fit the SES model to the training data
# optimized=True automatically selects the best smoothing parameter (alpha)
model_SES_autofit = model_SES.fit(optimized=True)

# Print model summary to review smoothing level and fit statistics
print("✅ SES model successfully fitted with optimized smoothing parameter:")
print(model_SES_autofit.summary())
✅ SES model successfully fitted with optimized smoothing parameter:
                       SimpleExpSmoothing Model Results                       
==============================================================================
Dep. Variable:              Sparkling   No. Observations:                  144
Model:             SimpleExpSmoothing   SSE                      250114511.829
Optimized:                       True   AIC                           2072.937
Trend:                           None   BIC                           2078.876
Seasonal:                        None   AICC                          2073.224
Seasonal Periods:                None   Date:                 Sun, 02 Nov 2025
Box-Cox:                        False   Time:                         13:44:21
Box-Cox Coeff.:                  None                                         
==============================================================================
                       coeff                 code              optimized      
------------------------------------------------------------------------------
smoothing_level            0.0366720                alpha                 True
initial_level              1686.0000                  l.0                False
------------------------------------------------------------------------------
In [386]:
# Display the optimized smoothing parameters learned by the SES model
# This shows the best smoothing level (alpha) chosen during fitting
print("📌 Fitted SES Model Parameters:")
model_SES_autofit.params
📌 Fitted SES Model Parameters:
Out[386]:
{'smoothing_level': np.float64(0.03667203108765097),
 'smoothing_trend': np.float64(nan),
 'smoothing_seasonal': np.float64(nan),
 'damping_trend': nan,
 'initial_level': np.float64(1686.0),
 'initial_trend': np.float64(nan),
 'initial_seasons': array([], dtype=float64),
 'use_boxcox': False,
 'lamda': None,
 'remove_bias': False}
In [387]:
# Generate forecast for the same number of periods as in the test set
# The SES model predicts future values based on the optimized smoothing level
SES_test['SES_forecast'] = model_SES_autofit.forecast(steps=len(df_test))

# Display first few forecast values to verify the output
print("📌 First few SES forecast values on Test Data:")
SES_test.head()
📌 First few SES forecast values on Test Data:
Out[387]:
Sparkling SES_forecast
Time_Stamp
1992-01-31 1577 2635.950609
1992-02-29 1667 2635.950609
1992-03-31 1993 2635.950609
1992-04-30 1997 2635.950609
1992-05-31 1783 2635.950609
In [388]:
# Plot the SES Forecast vs Actual values for both Train and Test sets
plt.figure(figsize=(14, 7))  # Larger figure for clearer visualization

# Plot actual training data
plt.plot(SES_train['Sparkling'], label='Training Data', linewidth=2)

# Plot actual test data
plt.plot(SES_test['Sparkling'], label='Actual Test Data', linewidth=2)

# Plot SES predictions
plt.plot(SES_test['SES_forecast'],
         label='SES Forecast (Optimized Alpha)',
         linestyle='--', linewidth=2)

# Title and axis labeling
plt.title('Simple Exponential Smoothing Forecast vs Actual', fontsize=16)
plt.xlabel('Time', fontsize=12)
plt.ylabel('Sparkling Sales', fontsize=12)

plt.legend(loc='best')
plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()
No description has been provided for this image

Model Evaluation for 𝛼 = 0.05 : Simple Exponential Smoothing¶

In [389]:
# Calculate RMSE for SES Model on Test Data
rmse_model_test = root_mean_squared_error(
    SES_test['Sparkling'],     # Actual values
    SES_test['SES_forecast']   # SES model predictions
)

# Print formatted result
print(f"✅ SES Model RMSE on Test Data: {rmse_model_test:.3f}")
✅ SES Model RMSE on Test Data: 1293.814
In [390]:
# Create a new row in the results table for the SES model performance
resultsDf_1 = pd.DataFrame(
    {'Test RMSE': [rmse_model_test]},
    index=['SES_Model']  # ✅ Model name label
)

# Append the SES model performance to the main results table
resultsDf = pd.concat([resultsDf, resultsDf_1])

# Display updated performance comparison
print("📊 Updated Model Performance Table:")
display(resultsDf)
📊 Updated Model Performance Table:
Test RMSE
Regression On Time Model 1.840430e+06
NaiveForecast 1.583972e+07
SimpleAverageModel 1.268683e+03
SES_Model 1.293814e+03

Setting different alpha values.¶

In [391]:
# Creating an empty DataFrame to store SES performance results
# for different alpha (smoothing level) values
resultsDf_a = pd.DataFrame(columns=['Alpha', 'Train RMSE', 'Test RMSE'])

# Display the initialized DataFrame
print("📊 Initialized Results DataFrame for Alpha Tuning:")
display(resultsDf_a)
📊 Initialized Results DataFrame for Alpha Tuning:
Alpha Train RMSE Test RMSE
In [392]:
alphas = np.arange(0.3, 1.0, 0.1)

for alpha in alphas:
    # Fit model with specified smoothing level (alpha)
    model_SES_alpha = SimpleExpSmoothing(
        SES_train['Sparkling']
    ).fit(smoothing_level=alpha, optimized=False)

    # Predictions for train and test
    train_pred = model_SES_alpha.fittedvalues
    test_pred = model_SES_alpha.forecast(len(SES_test))

    # Compute RMSE (new sklearn function)
    rmse_train = root_mean_squared_error(SES_train['Sparkling'], train_pred)
    rmse_test = root_mean_squared_error(SES_test['Sparkling'], test_pred)

    # Append results to table
    resultsDf_a.loc[len(resultsDf_a)] = [alpha, rmse_train, rmse_test]

# Display results
print("📊 SES Model Results for Multiple Alpha Values:")
display(resultsDf_a)
📊 SES Model Results for Multiple Alpha Values:
Alpha Train RMSE Test RMSE
0 0.3 1362.731346 1900.058569
1 0.4 1356.208919 2260.069389
2 0.5 1347.944758 2606.296390
3 0.6 1343.099607 2924.118301
4 0.7 1343.640190 3214.744366
5 0.8 1349.991473 3483.731806
6 0.9 1362.270034 3736.981096
In [393]:
# Sort the results to find the best alpha (lowest Test RMSE)
resultsDf_a = resultsDf_a.sort_values(by='Test RMSE', ascending=True)

# Display the sorted performance table
print("📊 SES Model Performance Sorted by Test RMSE:")
display(resultsDf_a)
📊 SES Model Performance Sorted by Test RMSE:
Alpha Train RMSE Test RMSE
0 0.3 1362.731346 1900.058569
1 0.4 1356.208919 2260.069389
2 0.5 1347.944758 2606.296390
3 0.6 1343.099607 2924.118301
4 0.7 1343.640190 3214.744366
5 0.8 1349.991473 3483.731806
6 0.9 1362.270034 3736.981096
Get Best Alpha Value¶
In [394]:
best_alpha = resultsDf_a.iloc[0]['Alpha']
print(f"🏆 Best Alpha: {best_alpha:.2f}")
🏆 Best Alpha: 0.30
Refit SES with Best Alpha & Forecast¶
In [395]:
# Fit the best SES model using alpha = 0.30
best_SES_model = SimpleExpSmoothing(
    SES_train['Sparkling']
).fit(smoothing_level=0.30, optimized=False)
Forecast for Test Period¶
In [396]:
SES_train['best_SES_pred'] = best_SES_model.fittedvalues
SES_test['best_SES_pred'] = best_SES_model.forecast(len(SES_test))

SES_test.head()
Out[396]:
Sparkling SES_forecast best_SES_pred
Time_Stamp
1992-01-31 1577 2635.950609 3795.337234
1992-02-29 1667 2635.950609 3795.337234
1992-03-31 1993 2635.950609 3795.337234
1992-04-30 1997 2635.950609 3795.337234
1992-05-31 1783 2635.950609 3795.337234
In [397]:
plt.figure(figsize=(18,9))

plt.plot(SES_train['Sparkling'], label='Training Data', linewidth=2)
plt.plot(SES_test['Sparkling'], label='Actual Test Data', linewidth=2)

plt.plot(SES_test['best_SES_pred'],
         label='SES Forecast (Alpha = 0.30)',
         linestyle='--', linewidth=2)

plt.title('SES Forecast vs Actual - Sparkling Sales', fontsize=16)
plt.xlabel('Time')
plt.ylabel('Sales')
plt.legend(loc='best')
plt.grid(True, linestyle='--', alpha=0.6)

plt.show()
No description has been provided for this image
In [398]:
# Create a results row for the best alpha SES model (Alpha = 0.30)
# We use the minimum Test RMSE value from the sorted alpha tuning results table
resultsDf_2 = pd.DataFrame(
    {'Test RMSE': [resultsDf_a.sort_values(by='Test RMSE', ascending=True).values[0][2]]},
    index=['SES (Alpha = 0.30)']  # ✅ Completed model name
)

# Append to the main comparison results DataFrame
resultsDf = pd.concat([resultsDf, resultsDf_2])

# Display the updated model comparison table
print("📊 Updated Model Performance Table:")
display(resultsDf)
📊 Updated Model Performance Table:
Test RMSE
Regression On Time Model 1.840430e+06
NaiveForecast 1.583972e+07
SimpleAverageModel 1.268683e+03
SES_Model 1.293814e+03
SES (Alpha = 0.30) 1.900059e+03

Double Exponential Smoothing (Holt's Model)¶

In [399]:
# Create copies of the train and test datasets for Double Exponential Smoothing (DES)
DES_train = df_train.copy()  # Copy of training dataset
DES_test = df_test.copy()    # Copy of testing dataset

print("✅ DES datasets created:")
print("Train shape:", DES_train.shape)
print("Test shape:", DES_test.shape)
✅ DES datasets created:
Train shape: (144, 1)
Test shape: (43, 1)
In [400]:
# Initialize the Holt's Linear Trend (Double Exponential Smoothing) model
# This model captures both LEVEL and TREND in the time series
model_DES = Holt(DES_train['Sparkling'].astype(float))

print("✅ Holt Model initialized successfully for DES.")
✅ Holt Model initialized successfully for DES.
In [401]:
# Fit the Holt (Double Exponential Smoothing) model
# optimized=True by default → automatically selects the best smoothing parameters (alpha & beta)
model_DES_autofit = model_DES.fit(optimized=True)

# Display the smoothing parameters learned by the model
print("✅ DES Model successfully fitted!")
print("📌 Model Parameters:")
print(model_DES_autofit.params)
✅ DES Model successfully fitted!
📌 Model Parameters:
{'smoothing_level': np.float64(0.6678921248101082), 'smoothing_trend': np.float64(0.00567583364200997), 'smoothing_seasonal': np.float64(nan), 'damping_trend': nan, 'initial_level': np.float64(1686.0), 'initial_trend': np.float64(-95.0), 'initial_seasons': array([], dtype=float64), 'use_boxcox': False, 'lamda': None, 'remove_bias': False}
In [402]:
# Display the optimized smoothing parameters learned by the Holt model
print("📌 Holt DES Model Parameters (α = level, β = trend):")
model_DES_autofit.params
📌 Holt DES Model Parameters (α = level, β = trend):
Out[402]:
{'smoothing_level': np.float64(0.6678921248101082),
 'smoothing_trend': np.float64(0.00567583364200997),
 'smoothing_seasonal': np.float64(nan),
 'damping_trend': nan,
 'initial_level': np.float64(1686.0),
 'initial_trend': np.float64(-95.0),
 'initial_seasons': array([], dtype=float64),
 'use_boxcox': False,
 'lamda': None,
 'remove_bias': False}
In [403]:
# Check the shape of the DES_test dataset
# This tells us the number of rows (time periods) and columns (features) available for forecasting evaluation
print("📌 Shape of DES Test Dataset (rows, columns):")
DES_test.shape
📌 Shape of DES Test Dataset (rows, columns):
Out[403]:
(43, 1)
In [404]:
# Predict future values for the same length as the test dataset
# Using Holt's model with optimized smoothing parameters
DES_test['DES_forecast'] = model_DES_autofit.forecast(steps=len(DES_test))

# Display preview of forecast values
print("📌 First few DES forecast values:")
DES_test.head()
📌 First few DES forecast values:
Out[404]:
Sparkling DES_forecast
Time_Stamp
1992-01-31 1577 5193.610144
1992-02-29 1667 5169.598791
1992-03-31 1993 5145.587438
1992-04-30 1997 5121.576085
1992-05-31 1783 5097.564731
In [405]:
# Plot DES (Double Exponential Smoothing) Forecast vs Actual Data
plt.figure(figsize=(18,9))

# Training history
plt.plot(DES_train['Sparkling'], label='Training Data', linewidth=2)

# Actual future observations
plt.plot(DES_test['Sparkling'], label='Actual Test Data', linewidth=2)

# DES predictions using optimized alpha & beta
alpha = model_DES_autofit.params['smoothing_level']
beta = model_DES_autofit.params['smoothing_trend']

plt.plot(
    DES_test['DES_forecast'],
    label=f'DES Forecast (α={alpha:.3f}, β={beta:.3f})',
    linestyle='--', linewidth=2
)

plt.title('Double Exponential Smoothing (Holt) – Forecast vs Actual', fontsize=16)
plt.xlabel('Time', fontsize=12)
plt.ylabel('Sparkling Sales', fontsize=12)

plt.legend(loc='best')
plt.grid(True, linestyle='--', alpha=0.6)

plt.tight_layout()
plt.show()
No description has been provided for this image
In [406]:
# Get optimized parameters from the fitted Holt (DES) model
alpha = model_DES_autofit.params['smoothing_level']
beta  = model_DES_autofit.params['smoothing_trend']

# Compute RMSE on the test set using the DES forecast column
rmse_model_test = root_mean_squared_error(
    DES_test['Sparkling'],      # actuals
    DES_test['DES_forecast']    # predictions from DES
)

# Nicely formatted result with true alpha/beta values
print(f"✅ Double Exponential Smoothing on Test Data — RMSE: {rmse_model_test:.3f} "
      f"(α = {alpha:.5f}, β = {beta:.5f})")
✅ Double Exponential Smoothing on Test Data — RMSE: 2654.534 (α = 0.66789, β = 0.00568)
In [407]:
# Create DataFrame entry for DES model performance
resultsDf_DES = pd.DataFrame(
    {'Test RMSE': [rmse_model_test]},
    index=['DES_Model']  # ✅ Completed model name
)

# Append DES results to the main comparison DataFrame
resultsDf = pd.concat([resultsDf, resultsDf_DES])

# Display the updated model leaderboard
print("📊 Updated Model Performance Table:")
display(resultsDf)
📊 Updated Model Performance Table:
Test RMSE
Regression On Time Model 1.840430e+06
NaiveForecast 1.583972e+07
SimpleAverageModel 1.268683e+03
SES_Model 1.293814e+03
SES (Alpha = 0.30) 1.900059e+03
DES_Model 2.654534e+03
In [408]:
# Create an empty DataFrame to store Holt (DES) tuning results
# for different Alpha (level smoothing) and Beta (trend smoothing)
resultsDf_b = pd.DataFrame(columns=['Alpha', 'Beta', 'Train RMSE', 'Test RMSE'])

# Display initialized tuning table
print("📊 Initialized Results table for DES Alpha/Beta tuning:")
display(resultsDf_b)
📊 Initialized Results table for DES Alpha/Beta tuning:
Alpha Beta Train RMSE Test RMSE
In [409]:
# Number of observations in the test dataset
print("📌 Number of periods in Test Set:", len(df_test))
📌 Number of periods in Test Set: 43
In [410]:
# Grids for alpha (level) and beta (trend)
alphas = np.arange(0.3, 1.1, 0.1)
betas  = np.arange(0.3, 1.1, 0.1)

# Ensure resultsDf_b has the right columns
resultsDf_b = pd.DataFrame(columns=['Alpha', 'Beta', 'Train RMSE', 'Test RMSE'])

# Length of test horizon
h = len(DES_test)

for alpha in alphas:
    for beta in betas:
        # Fit Holt (DES) with fixed smoothing params
        model_ij = Holt(DES_train['Sparkling']).fit(
            smoothing_level=alpha,
            smoothing_trend=beta,
            optimized=False
        )

        # Predictions (keep them as local arrays/Series)
        train_pred = model_ij.fittedvalues
        test_pred  = model_ij.forecast(h)

        # RMSE on train and test
        rmse_train = root_mean_squared_error(DES_train['Sparkling'], train_pred)
        rmse_test  = root_mean_squared_error(DES_test['Sparkling'],  test_pred)

        # Append a row to the results table
        resultsDf_b.loc[len(resultsDf_b)] = [alpha, beta, rmse_train, rmse_test]

# Sort to see the best combination on top
resultsDf_b = resultsDf_b.sort_values(by='Test RMSE', ascending=True)

# Peek at the best few
print("📊 Top DES alpha/beta combos by Test RMSE:")
display(resultsDf_b.head())
📊 Top DES alpha/beta combos by Test RMSE:
Alpha Beta Train RMSE Test RMSE
0 0.3 0.3 1599.398181 13814.398308
8 0.4 0.3 1578.236948 18044.669982
1 0.3 0.4 1693.819564 19249.257218
16 0.5 0.3 1538.865716 20684.637333
24 0.6 0.3 1513.676637 22536.689632
In [411]:
# Display all column names in the DES_test DataFrame
print(list(DES_test.columns))
['Sparkling', 'DES_forecast']
In [412]:
# Extract the Best Performing Row Automatically
best_row = resultsDf_b.sort_values(by='Test RMSE', ascending=True).iloc[0]

best_alpha = best_row['Alpha']
best_beta = best_row['Beta']
best_rmse = best_row['Test RMSE']

best_alpha, best_beta, best_rmse
Out[412]:
(np.float64(0.3), np.float64(0.3), np.float64(13814.398308245514))
In [413]:
#Create a Clean, Accurate Performance Entry

# Assign a readable and dynamic model name based on best α and β
model_name = f"DES (Alpha={best_alpha:.2f}, Beta={best_beta:.2f})"

# Create dataframe for this result
resultsDf_3 = pd.DataFrame({'Test RMSE': [best_rmse]}, index=[model_name])

# Append to main results table
resultsDf = pd.concat([resultsDf, resultsDf_3])

print("📊 Updated Model Performance Table:")
display(resultsDf)
📊 Updated Model Performance Table:
Test RMSE
Regression On Time Model 1.840430e+06
NaiveForecast 1.583972e+07
SimpleAverageModel 1.268683e+03
SES_Model 1.293814e+03
SES (Alpha = 0.30) 1.900059e+03
DES_Model 2.654534e+03
DES (Alpha=0.30, Beta=0.30) 1.381440e+04

Triple Exponential Smoothing (Holt - Winter's Model)¶

In [414]:
# Creating copies of the train and test datasets for Holt-Winter's Model
TES_train = df_train.copy()
TES_test = df_test.copy()

print("✅ TES datasets created successfully")
print("Train shape:", TES_train.shape)
print("Test shape:", TES_test.shape)
✅ TES datasets created successfully
Train shape: (144, 1)
Test shape: (43, 1)
In [415]:
model_TES = ExponentialSmoothing(TES_train['Sparkling'],trend='additive',seasonal='multiplicative',freq='M')
In [416]:
# Initialize the Holt-Winters Triple Exponential Smoothing (TES) Model
model_TES = ExponentialSmoothing(
    TES_train['Sparkling'],
    trend='add',          # Additive trend component (linear movement over time)
    seasonal='mul',       # Multiplicative seasonal effect (seasonal variation grows over time)
    seasonal_periods=12   # 12 months → yearly seasonality
)

print("✅ Holt-Winters TES model initialized successfully!")

# NOTE: freq='M' is no longer supported as a parameter
✅ Holt-Winters TES model initialized successfully!
In [417]:
# Fit the Holt-Winters TES model
# optimized=True by default → automatically tunes alpha, beta, gamma
model_TES_autofit = model_TES.fit(optimized=True)   # optimized=True to find the best possible values for alpha, beta, and gamma to improve forecast accuracy.

# Show the final parameters learned by the model
print("✅ TES model successfully fitted!")
print("📌 Optimized parameters:")
print(model_TES_autofit.params)
✅ TES model successfully fitted!
📌 Optimized parameters:
{'smoothing_level': np.float64(0.07491295595488828), 'smoothing_trend': np.float64(0.07491295595488828), 'smoothing_seasonal': np.float64(0.2833959688663245), 'damping_trend': nan, 'initial_level': np.float64(2356.5020981873167), 'initial_trend': np.float64(-19.55614396441527), 'initial_seasons': array([0.69642481, 0.66263147, 0.8661978 , 0.78371893, 0.6409794 ,
       0.62710199, 0.86203455, 1.11207461, 0.90589561, 1.22639337,
       1.87827821, 2.43148971]), 'use_boxcox': False, 'lamda': None, 'remove_bias': False}
In [418]:
# Display the optimized smoothing parameters learned by the Holt-Winters TES model
print("📌 Optimized parameters from TES Model:")
model_TES_autofit.params
📌 Optimized parameters from TES Model:
Out[418]:
{'smoothing_level': np.float64(0.07491295595488828),
 'smoothing_trend': np.float64(0.07491295595488828),
 'smoothing_seasonal': np.float64(0.2833959688663245),
 'damping_trend': nan,
 'initial_level': np.float64(2356.5020981873167),
 'initial_trend': np.float64(-19.55614396441527),
 'initial_seasons': array([0.69642481, 0.66263147, 0.8661978 , 0.78371893, 0.6409794 ,
        0.62710199, 0.86203455, 1.11207461, 0.90589561, 1.22639337,
        1.87827821, 2.43148971]),
 'use_boxcox': False,
 'lamda': None,
 'remove_bias': False}
In [419]:
# Predict future values for the same length as the test dataset
# Using the optimized Holt-Winters TES model
TES_test['TES_forecast'] = model_TES_autofit.forecast(steps=len(df_test))

# Preview the forecast results
print("📌 First few TES forecast values on Test Data:")
TES_test.head()
📌 First few TES forecast values on Test Data:
Out[419]:
Sparkling TES_forecast
Time_Stamp
1992-01-31 1577 1699.806254
1992-02-29 1667 1588.100259
1992-03-31 1993 1797.378411
1992-04-30 1997 1568.061775
1992-05-31 1783 1518.346802
In [420]:
# Plot TES Forecast vs Actual
plt.figure(figsize=(18,9))

# Plot Training Data
plt.plot(TES_train['Sparkling'], label='Training Data', linewidth=2)

# Plot Actual Test Data
plt.plot(TES_test['Sparkling'], label='Actual Test Data', linewidth=2)

# Extract optimized parameters
alpha = model_TES_autofit.params['smoothing_level']
beta  = model_TES_autofit.params['smoothing_trend']
gamma = model_TES_autofit.params['smoothing_seasonal']

# Plot the TES Forecast
plt.plot(
    TES_test['TES_forecast'],
    label=f'TES Forecast (α={alpha:.3f}, β={beta:.3f}, γ={gamma:.3f})',
    linestyle='--',
    linewidth=2
)

# Formatting & Labels
plt.title("Holt-Winters Triple Exponential Smoothing - Forecast vs Actual", fontsize=16)
plt.xlabel("Time", fontsize=12)
plt.ylabel("Sparkling Sales", fontsize=12)
plt.legend(loc='best')
plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout()

plt.show()
No description has been provided for this image
In [421]:
# Extract optimized smoothing parameters from the model
alpha = model_TES_autofit.params['smoothing_level']
beta  = model_TES_autofit.params['smoothing_trend']
gamma = model_TES_autofit.params['smoothing_seasonal']

# Calculate RMSE on the test set using TES forecast
rmse_model_test = root_mean_squared_error(
    TES_test['Sparkling'],       # actual values
    TES_test['TES_forecast']     # forecast from TES model
)

# Display results properly formatted with real values
print(f"✅ TES Model RMSE on Test Data: {rmse_model_test:.3f} "
      f"(α={alpha:.3f}, β={beta:.3f}, γ={gamma:.3f})")
✅ TES Model RMSE on Test Data: 350.053 (α=0.075, β=0.075, γ=0.283)
In [422]:
# Create an empty DataFrame to store TES alpha, beta, gamma tuning results
resultsDf_c = pd.DataFrame(
    columns=['Alpha', 'Beta', 'Gamma', 'Train RMSE', 'Test RMSE']
)

# Display initialized table
print("📊 Initialized TES parameter tuning table:")
display(resultsDf_c)
📊 Initialized TES parameter tuning table:
Alpha Beta Gamma Train RMSE Test RMSE
In [423]:
# Ensure results table has clean column names
resultsDf_c = pd.DataFrame(columns=['Alpha', 'Beta', 'Gamma', 'Train RMSE', 'Test RMSE'])

alphas = np.arange(0.3, 1.1, 0.1)
betas  = np.arange(0.3, 1.1, 0.1)
gammas = np.arange(0.3, 1.1, 0.1)
h = len(TES_test)

row = 0
for alpha in alphas:
    for beta in betas:
        for gamma in gammas:
            # Fit TES with fixed smoothing params
            model = ExponentialSmoothing(
                TES_train['Sparkling'],
                trend='add',
                seasonal='mul',
                seasonal_periods=12
            ).fit(
                smoothing_level=alpha,
                smoothing_trend=beta,
                smoothing_seasonal=gamma,
                optimized=False
            )

            # In-sample fitted values and out-of-sample forecast
            train_pred = model.fittedvalues
            test_pred  = model.forecast(h)

            # RMSEs
            rmse_train = root_mean_squared_error(TES_train['Sparkling'], train_pred)
            rmse_test  = root_mean_squared_error(TES_test['Sparkling'],  test_pred)

            # Append a row
            resultsDf_c.loc[row] = [alpha, beta, gamma, rmse_train, rmse_test]
            row += 1

# Sort so the best (lowest Test RMSE) is first
resultsDf_c = resultsDf_c.sort_values(by='Test RMSE', ascending=True)

print("📊 Top TES α/β/γ by Test RMSE:")
display(resultsDf_c.head())
📊 Top TES α/β/γ by Test RMSE:
Alpha Beta Gamma Train RMSE Test RMSE
128 0.5 0.3 0.3 448.489627 412.613756
194 0.6 0.3 0.5 541.879579 478.752553
72 0.4 0.4 0.3 451.720449 485.608082
260 0.7 0.3 0.7 726.763349 561.382299
81 0.4 0.5 0.4 508.681964 579.245326
In [424]:
best_row = resultsDf_c.sort_values('Test RMSE', ascending=True).iloc[0]
a, b, g = float(best_row['Alpha']), float(best_row['Beta']), float(best_row['Gamma'])

# 2) Refit Holt-Winters with those parameters (no MultiIndex columns)
best_TES_model = ExponentialSmoothing(
    TES_train['Sparkling'],
    trend='add',
    seasonal='mul',
    seasonal_periods=12
).fit(
    smoothing_level=a,
    smoothing_trend=b,
    smoothing_seasonal=g,
    optimized=False
)

# 3) Forecast cleanly into a single column
TES_test['TES_bf_forecast'] = best_TES_model.forecast(len(TES_test))

# 4) Plot train, test, and the brute-force (best α/β/γ) forecast
plt.figure(figsize=(18,9))
plt.plot(TES_train['Sparkling'], label='Training Data', linewidth=2)
plt.plot(TES_test['Sparkling'], label='Actual Test Data', linewidth=2)
plt.plot(TES_test['TES_bf_forecast'],
         label=f'TES Forecast (α={a:.2f}, β={b:.2f}, γ={g:.2f})',
         linestyle='--', linewidth=2)

plt.title('Holt-Winters (TES) — Forecast vs Actual (Grid-Searched α,β,γ)', fontsize=16)
plt.xlabel('Time'); plt.ylabel('Sparkling Sales')
plt.legend(loc='best'); plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout(); plt.show()
No description has been provided for this image
In [425]:
# Extract the best performing row automatically
best_row = resultsDf_c.sort_values(by='Test RMSE').iloc[0]

best_alpha = best_row['Alpha']
best_beta  = best_row['Beta']
best_gamma = best_row['Gamma']
best_rmse  = best_row['Test RMSE']

# Create a readable model label based on actual best values
model_name = f"TES (α={best_alpha:.2f}, β={best_beta:.2f}, γ={best_gamma:.2f})"

# Add this result to the performance leaderboard
resultsDf_2 = pd.DataFrame({'Test RMSE': [best_rmse]}, index=[model_name])
resultsDf = pd.concat([resultsDf, resultsDf_2])

print("📊 Updated Model Performance Table:")
display(resultsDf)
📊 Updated Model Performance Table:
Test RMSE
Regression On Time Model 1.840430e+06
NaiveForecast 1.583972e+07
SimpleAverageModel 1.268683e+03
SES_Model 1.293814e+03
SES (Alpha = 0.30) 1.900059e+03
DES_Model 2.654534e+03
DES (Alpha=0.30, Beta=0.30) 1.381440e+04
TES (α=0.50, β=0.30, γ=0.30) 4.126138e+02
In [426]:
# Sort the model performance results in ascending order of RMSE
# ✅ Lower RMSE means better model accuracy
resultsDf1 = resultsDf.sort_values(by='Test RMSE', ascending=True)

print("📊 Model Performance Ranking (Lowest RMSE = Best):")
display(resultsDf1)
📊 Model Performance Ranking (Lowest RMSE = Best):
Test RMSE
TES (α=0.50, β=0.30, γ=0.30) 4.126138e+02
SimpleAverageModel 1.268683e+03
SES_Model 1.293814e+03
SES (Alpha = 0.30) 1.900059e+03
DES_Model 2.654534e+03
DES (Alpha=0.30, Beta=0.30) 1.381440e+04
Regression On Time Model 1.840430e+06
NaiveForecast 1.583972e+07
Modified Format RMSE Values as Integers¶
In [427]:
# Sort performance results by Test RMSE (lowest = best)
resultsDf1 = resultsDf.sort_values(by='Test RMSE', ascending=True)

print("📊 Model Performance Ranking (Lowest RMSE = Best):")
display(resultsDf1.style.format({'Test RMSE': '{:,.2f}'}))
📊 Model Performance Ranking (Lowest RMSE = Best):
  Test RMSE
TES (α=0.50, β=0.30, γ=0.30) 412.61
SimpleAverageModel 1,268.68
SES_Model 1,293.81
SES (Alpha = 0.30) 1,900.06
DES_Model 2,654.53
DES (Alpha=0.30, Beta=0.30) 13,814.40
Regression On Time Model 1,840,430.14
NaiveForecast 15,839,720.95

Observations¶

Naive Forecast Model

  • The Naive model predicts every future point as the last observed sales value.

  • Completely fails to capture seasonal spikes, resulting in a very high error (worst RMSE).

Simple Average Model

  • Forecast is flat, based on the long term average.

  • Slightly superior to Naive and yet does not capture seasonal variations, resulting in large errors during months of high peaks.

Simple Exponential Smoothing (SES)

  • Gives more weight to recent data but still ignores trend and seasonality.

  • Slightly outperforms the Simple Average model but is still inadequate when the seasonal patterns are strong.

Des (Double Exponential Smoothing)

  • It catches a linear trend but not seasonality.

  • Forecasts produce a declining trend that does not match real seasonal peaks.

  • Better than SES but still poor performance during high demand periods.

Triple Exponential Smoothing (TES / Holt Winters)

  • Successfully models trend + seasonality, closely following real values.

It smoothly matches the seasonal peaks and channels during the test period.

  • Best performance with lowest RMSE: most accurate model of all that were tested.

Conclusion:

  • Holt Winters TES is the best forecasting model of Sparkling Wine Sales, as it can capture a long-term trend and strong seasonal pattern with the lowest error rate.

Checking for Stationarity¶

In [428]:
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries):

    #determining roll statistics
    rolmean = timeseries.rolling(window=12).mean()
    rolstd = timeseries.rolling(window=12).std()

    ##plot rolling Statistics:
    orig = plt.plot(timeseries,color='blue',label='Original')
    mean = plt.plot(rolmean, color='red', label='Rolling Mean')
    std = plt.plot(rolstd, color='black', label='Rolling Std')
    plt.legend(loc='best')
    plt.title('Rolling Mean and Standard Deviation')
    plt.show(block=False)

    #Perform Dickey-Fuller Test:
    print('Results of Dickey Fuller Test:')
    dftest = adfuller(timeseries, autolag='AIC')
    dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
    for key,value in dftest[4].items():
        dfoutput['Critical Value (%s)'%key] = value
    print(dfoutput,'\n')
In [429]:
def test_stationarity(ts, window=12, title="Rolling Mean & Std"):
    """
    Check stationarity of a time series using rolling stats + ADF test.

    Parameters
    ----------
    ts : pd.Series
        Univariate time series (indexed by datetime ideally).
    window : int, default=12
        Rolling window size (12 = monthly data, 1 year).
    title : str
        Title for the rolling statistics plot.

    Returns
    -------
    results : dict
        Dictionary with ADF test statistic, p-value, lags, nobs, and critical values.
    """
    # Ensure Series and drop missing values (ADF cannot handle NaN)
    ts = pd.Series(ts).dropna()

    # --- Rolling statistics
    rolmean = ts.rolling(window=window, min_periods=window).mean()
    rolstd  = ts.rolling(window=window, min_periods=window).std()

    # --- Plot
    plt.figure(figsize=(12, 5))
    plt.plot(ts, label='Original', linewidth=2)
    plt.plot(rolmean, label=f'Rolling Mean (win={window})', linestyle='--')
    plt.plot(rolstd, label=f'Rolling Std (win={window})', linestyle=':')
    plt.legend(loc='best')
    plt.title(title)
    plt.grid(True, linestyle='--', alpha=0.4)
    plt.tight_layout()
    plt.show()

    # --- ADF Test
    print("Results of Augmented Dickey-Fuller (ADF) Test")
    dftest = adfuller(ts.values, autolag='AIC')  # use values to avoid index issues
    adf_stat, pval, used_lag, nobs = dftest[0], dftest[1], dftest[2], dftest[3]
    crit_vals = dftest[4]

    # Pretty print
    print(f"Test Statistic : {adf_stat: .4f}")
    print(f"p-value        : {pval: .6f}")
    print(f"# Lags Used    : {used_lag}")
    print(f"# Observations : {nobs}")
    for k, v in crit_vals.items():
        print(f"Critical Value ({k}) : {v: .4f}")

    # Simple verdict
    verdict = "Stationary (reject H0)" if pval < 0.05 else "Non-stationary (fail to reject H0)"
    print(f"\nVerdict @ 5%  : {verdict}\n")

    return {
        "adf_stat": adf_stat,
        "p_value": pval,
        "used_lag": used_lag,
        "nobs": nobs,
        "critical_values": crit_vals,
        "verdict_5pct": verdict
    }
In [430]:
# ✅ Checking stationarity for the Training portion of Sparkling Sales
# This helps determine our differencing needs for SARIMA modeling
test_stationarity(df_train['Sparkling'])
No description has been provided for this image
Results of Augmented Dickey-Fuller (ADF) Test
Test Statistic : -1.2658
p-value        :  0.644683
# Lags Used    : 12
# Observations : 131
Critical Value (1%) : -3.4813
Critical Value (5%) : -2.8839
Critical Value (10%) : -2.5787

Verdict @ 5%  : Non-stationary (fail to reject H0)

Out[430]:
{'adf_stat': np.float64(-1.2657713392815138),
 'p_value': np.float64(0.6446829195512411),
 'used_lag': 12,
 'nobs': 131,
 'critical_values': {'1%': np.float64(-3.481281802271349),
  '5%': np.float64(-2.883867891664528),
  '10%': np.float64(-2.5786771965503177)},
 'verdict_5pct': 'Non-stationary (fail to reject H0)'}

Observations:

ADF Statistic: -1.2658

p-value: 0.6447

Critical Value @ 5%: -2.8839

Conclusion:

Fail to reject the null hypothesis (H0)

The p-value is significantly higher than 0.05 and the ADF statistic is less negative than the critical value, indicating that the Sparkling Wine Sales series is non-stationary. This confirms the presence of trend and seasonal effects observed in the earlier plots.

df_train Model

In [431]:
dftest = adfuller(df_train.diff().dropna(),regression='ct')
print('DF test statistics is %3.3f' %dftest[0])
print('DF test p-value is', dftest[1])
print('DF test p-value is', dftest[2])
DF test statistics is -8.628
DF test p-value is 2.5490597847949586e-12
DF test p-value is 11
In [432]:
# ADF on first-differenced training series (removes linear trend)
# regression='ct' includes constant + linear trend in the test regression
ts_diff1 = df_train['Sparkling'].diff().dropna()

dftest = adfuller(ts_diff1.values, regression='ct', autolag='AIC')

# Unpack results
adf_stat, pval, used_lag, nobs = dftest[0], dftest[1], dftest[2], dftest[3]
crit_vals = dftest[4]

# Pretty print
print("Results of Augmented Dickey-Fuller (ADF) on df_train ΔSparkling (regression='ct')")
print(f"Test Statistic : {adf_stat: .4f}")
print(f"p-value        : {pval: .6f}")
print(f"# Lags Used    : {used_lag}")
print(f"# Observations : {nobs}")
for k, v in crit_vals.items():
    print(f"Critical Value ({k}) : {v: .4f}")

# Verdict at 5%
verdict = "Stationary (reject H0)" if pval < 0.05 else "Non-stationary (fail to reject H0)"
print(f"\nVerdict @ 5%  : {verdict}")
Results of Augmented Dickey-Fuller (ADF) on df_train ΔSparkling (regression='ct')
Test Statistic : -8.6282
p-value        :  0.000000
# Lags Used    : 11
# Observations : 131
Critical Value (1%) : -4.0296
Critical Value (5%) : -3.4446
Critical Value (10%) : -3.1470

Verdict @ 5%  : Stationary (reject H0)
In [433]:
plt.figure(figsize=(14, 6))  # Bigger chart for better visibility

# Plot the training data for Sparkling Sales
df_train['Sparkling'].plot(linewidth=2, label='Training Sparkling Sales')

# Improve readability
plt.title('Training Data – Sparkling Wine Sales', fontsize=14)
plt.xlabel('Time', fontsize=12)
plt.ylabel('Sales', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend(loc='best')
plt.tight_layout()

plt.show()
No description has been provided for this image

Observations:

ADF Statistic: -8.6282

p-value: 0.0000

Critical Value @ 5%: -3.4446

conclusion:

Reject the null hypothesis (H₀)

After applying first differencing, the series becomes stationary. The low p-value (0.0000) and ADF statistic being much lower than the critical value indicate that the transformed data now has stable mean and variance, making it suitable for ARIMA/SARIMA modeling.

Automated ARIMA Model based on lowest AIC¶

In [434]:
import itertools

# Define the parameter ranges for ARIMA (p, d, q)
# p = AR order (number of lag observations included)
# d = differencing order (to make the series stationary)
# q = MA order (size of moving average window)
p = q = range(0, 4)   # p and q will each take values: 0,1,2,3
d = range(1, 2)       # d will only take value: 1  (based on stationarity test)

# Creating all combinations of (p,d,q) using Cartesian product
# itertools.product generates every possible tuple like (0,1,0), (1,1,2), etc.
pdq = list(itertools.product(p, d, q))

# Display the generated model parameter combinations
print('Examples of the parameter combinations for the ARIMA models')
for i in range(0, len(pdq)):
    # Format each tuple into a readable model notation
    print('Model : ARIMA{}'.format(pdq[i]))
Examples of the parameter combinations for the ARIMA models
Model : ARIMA(0, 1, 0)
Model : ARIMA(0, 1, 1)
Model : ARIMA(0, 1, 2)
Model : ARIMA(0, 1, 3)
Model : ARIMA(1, 1, 0)
Model : ARIMA(1, 1, 1)
Model : ARIMA(1, 1, 2)
Model : ARIMA(1, 1, 3)
Model : ARIMA(2, 1, 0)
Model : ARIMA(2, 1, 1)
Model : ARIMA(2, 1, 2)
Model : ARIMA(2, 1, 3)
Model : ARIMA(3, 1, 0)
Model : ARIMA(3, 1, 1)
Model : ARIMA(3, 1, 2)
Model : ARIMA(3, 1, 3)
In [435]:
# Creating an empty dataframe to store ARIMA model results
# Columns:
#   param → the (p, d, q) parameter combination tested for ARIMA
#   AIC   → the AIC score for each fitted model
ARIMA_AIC = pd.DataFrame(columns=['param', 'AIC'])

# Display the initialized table
print("✅ Initialized ARIMA AIC results table:")
ARIMA_AIC
✅ Initialized ARIMA AIC results table:
Out[435]:
param AIC
In [436]:
from statsmodels.tsa.arima.model import ARIMA
import warnings
warnings.filterwarnings("ignore")  # Suppress convergence warnings for cleaner output

# Loop through each (p, d, q) combination in pdq
for param in pdq:
    try:
        # Fit ARIMA(p,d,q) model on training series
        model = ARIMA(df_train['Sparkling'], order=param).fit()

        # Print AIC for each model
        print(f"ARIMA{param} → AIC = {model.aic:.2f}")

        # Store model parameters and AIC into results dataframe
        ARIMA_AIC.loc[len(ARIMA_AIC)] = [param, model.aic]

    except Exception as e:
        # Some parameter combinations fail to converge → skip safely
        print(f"ARIMA{param} → ❌ Failed ({e})")
        continue
ARIMA(0, 1, 0) → AIC = 2476.75
ARIMA(0, 1, 1) → AIC = 2470.90
ARIMA(0, 1, 2) → AIC = 2439.47
ARIMA(0, 1, 3) → AIC = 2439.13
ARIMA(1, 1, 0) → AIC = 2474.92
ARIMA(1, 1, 1) → AIC = 2440.49
ARIMA(1, 1, 2) → AIC = 2439.71
ARIMA(1, 1, 3) → AIC = 2440.81
ARIMA(2, 1, 0) → AIC = 2468.71
ARIMA(2, 1, 1) → AIC = 2438.87
ARIMA(2, 1, 2) → AIC = 2419.17
ARIMA(2, 1, 3) → AIC = 2438.37
ARIMA(3, 1, 0) → AIC = 2466.35
ARIMA(3, 1, 1) → AIC = 2440.43
ARIMA(3, 1, 2) → AIC = 2435.48
ARIMA(3, 1, 3) → AIC = 2427.51
In [437]:
# Sort ARIMA results by AIC in ascending order (lowest AIC = best model)
best_ARIMA = ARIMA_AIC.sort_values(by='AIC', ascending=True).reset_index(drop=True)

# Display the top 5 best-performing ARIMA models based on AIC
print("📊 Top ARIMA(p,d,q) models ranked by lowest AIC:")
display(best_ARIMA.head().style.format({'AIC': '{:,.2f}'}))
📊 Top ARIMA(p,d,q) models ranked by lowest AIC:
  param AIC
0 (2, 1, 2) 2,419.17
1 (3, 1, 3) 2,427.51
2 (3, 1, 2) 2,435.48
3 (2, 1, 3) 2,438.37
4 (2, 1, 1) 2,438.87
In [438]:
# ✅ Fit the best ARIMA model found from AIC ranking
# Using only the target variable (Sparkling), not the full dataframe
auto_ARIMA = ARIMA(df_train['Sparkling'], order=(2,1,2))

# Fit the model to learn AR, I, MA parameters
results_auto_ARIMA = auto_ARIMA.fit()

# Display full model summary including AIC & coefficients
print("📌 ARIMA(2,1,2) Model Summary:\n")
print(results_auto_ARIMA.summary())
📌 ARIMA(2,1,2) Model Summary:

                               SARIMAX Results                                
==============================================================================
Dep. Variable:              Sparkling   No. Observations:                  144
Model:                 ARIMA(2, 1, 2)   Log Likelihood               -1204.583
Date:                Sun, 02 Nov 2025   AIC                           2419.167
Time:                        13:44:37   BIC                           2433.981
Sample:                    01-31-1980   HQIC                          2425.187
                         - 12-31-1991                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          1.3214      0.043     30.410      0.000       1.236       1.407
ar.L2         -0.5501      0.062     -8.906      0.000      -0.671      -0.429
ma.L1         -1.9912      0.105    -18.931      0.000      -2.197      -1.785
ma.L2          0.9995      0.106      9.473      0.000       0.793       1.206
sigma2      1.136e+06   1.88e-07   6.05e+12      0.000    1.14e+06    1.14e+06
===================================================================================
Ljung-Box (L1) (Q):                   0.07   Jarque-Bera (JB):                15.12
Prob(Q):                              0.80   Prob(JB):                         0.00
Heteroskedasticity (H):               2.03   Skew:                             0.61
Prob(H) (two-sided):                  0.02   Kurtosis:                         4.03
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 1.15e+28. Standard errors may be unstable.
In [439]:
# Plotting ARIMA diagnostic plots to evaluate residual behavior
# This includes:
# ✅ Standardized residuals (look for randomness)
# ✅ Q-Q plot (check normal distribution)
# ✅ ACF of residuals (look for no auto-correlation)
# ✅ Histogram of residuals (normal bell curve)
results_auto_ARIMA.plot_diagnostics(figsize=(14, 10))
plt.suptitle("ARIMA Model Diagnostics", fontsize=14, y=1.02)
plt.show()
No description has been provided for this image

Predict on the Test Set using this model and evaluate the model.

In [440]:
# ✅ Forecast future values using the fitted ARIMA model
# steps=len(df_test) → predict the same number of periods as the Test dataset
predicted_auto_ARIMA = results_auto_ARIMA.forecast(steps=len(df_test))
In [441]:
## Mean Absolute Percentage Error (MAPE) - Function Definition

def mean_absolute_percentage_error(y_true, y_pred):
    return np.mean((np.abs(y_true-y_pred))/(y_true))*100

## Importing the mean_squared_error function from sklearn to calculate the RMSE
In [442]:
# ✅ MAPE (Mean Absolute Percentage Error) function
# Measures the forecast error as a percentage of the actual values
# multiplied by 100 to express as a % error
# np.clip is used to avoid division by zero issues if any actual value = 0
def mean_absolute_percentage_error(y_true, y_pred):
    y_true, y_pred = np.array(y_true), np.array(y_pred)  # ensure arrays
    return np.mean(np.abs((y_true - y_pred) / np.clip(y_true, 1e-10, None))) * 100
    #   ^ avoid 0 division by replacing zeros with tiny number (1e-10)

# ✅ mean_squared_error imported to compute RMSE:
# RMSE = sqrt(MSE)
# Lower RMSE → better forecast accuracy
In [443]:
# ✅ Evaluate ARIMA(2,1,2) Forecast Performance

# RMSE → Root Mean Squared Error
# Measures forecast error in the same units as sales.
# Lower RMSE = more accurate forecast.
rmse = root_mean_squared_error(df_test['Sparkling'], predicted_auto_ARIMA)

# MAPE → Mean Absolute Percentage Error
# Measures the average percentage difference between forecast and actual values.
# Lower MAPE (%) = better forecast quality relative to actual scale.
mape = mean_absolute_percentage_error(df_test['Sparkling'], predicted_auto_ARIMA)

# ✅ Display results with formatted text for readability
print(f"📌 ARIMA(2,1,2) — RMSE: {rmse:,.2f}  |  MAPE: {mape:.2f}%")
📌 ARIMA(2,1,2) — RMSE: 1,309.63  |  MAPE: 42.11%
In [444]:
from math import sqrt
from sklearn.metrics import mean_squared_error

# ✅ RMSE: Root Mean Squared Error calculation
# mean_squared_error → returns MSE
# sqrt(MSE)          → converts MSE to RMSE (original units of sales)
rmse = sqrt(mean_squared_error(df_test['Sparkling'], predicted_auto_ARIMA))

# Display the RMSE result rounded for clarity
print(f"📌 ARIMA(2,1,2) — RMSE: {rmse:,.2f}")
📌 ARIMA(2,1,2) — RMSE: 1,309.63
In [445]:
# ✅ Create a performance table (DataFrame) to store forecast accuracy metrics
# rmse  → Root Mean Squared Error (lower = better accuracy)
# mape → Mean Absolute Percentage Error (lower % = better accuracy)
# index → Model name for easy comparison later
resultsDf = pd.DataFrame(
    {'RMSE': rmse, 'MAPE': mape},
    index=['ARIMA(2,1,2)']
)

# ✅ Display the results table for ARIMA model
resultsDf
Out[445]:
RMSE MAPE
ARIMA(2,1,2) 1309.631475 42.106197

Observations:

  • Coefficients are mostly statistically significant

  • Diagnostic plots show residuals are close to normal

  • No major issues of autocorrelation left in residuals

  • The model fits reasonably well.

  • Slight skew & kurtosis indicate model can still be improved

  • ARIMA(2,1,2) provided lower forecast error and residuals that are better behaved, hence it catches the underlying trend better and thus is a reliable model for this series.

Automated SARIMA Model based on lowest AIC¶

In [446]:
# ✅ Autocorrelation plot on first-differenced training data
# First difference removes trend → helps us inspect remaining correlations
# missing='drop' handles the NaN created by differencing
# lags=40 shows the first 40 lag correlations
plot_acf(df_train['Sparkling'].diff().dropna(),
         title='Training Data Autocorrelation (After First Difference)',
         lags=40);

plt.grid(True, linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()
No description has been provided for this image
In [447]:
import itertools

# ✅ Define parameter ranges for ARIMA(p,d,q)
# p = AR order → number of autoregressive lags
# d = differencing order → to remove trend (we already found d=1)
# q = MA order → number of lagged forecast errors in the model
p = q = range(0, 4)   # p and q will take values 0,1,2,3
d = range(1, 2)       # d = 1 based on ADF test

# ✅ Define seasonal parameter ranges for SARIMA(P,D,Q,s)
# P = seasonal AR order
# D = seasonal differencing (start with 0 since we allow the model to choose)
# Q = seasonal MA order
# s = 12 months (seasonality period = yearly cycles)
D = range(0, 1)       # allowing D = 0 initially (we can later test D=1)
pdq = list(itertools.product(p, d, q))  # non-seasonal combos
PDQ = [(x[0], x[1], x[2], 12)           # seasonal combos with period=12
       for x in list(itertools.product(p, D, q))]

# ✅ Print sample model combinations
print('Examples of the parameter combinations for the SARIMA Model:')
for i in range(min(10, len(pdq))):     # show first 10 only for brevity
    print(f"SARIMA order: {pdq[i]}  x  Seasonal order: {PDQ[i]}")
Examples of the parameter combinations for the SARIMA Model:
SARIMA order: (0, 1, 0)  x  Seasonal order: (0, 0, 0, 12)
SARIMA order: (0, 1, 1)  x  Seasonal order: (0, 0, 1, 12)
SARIMA order: (0, 1, 2)  x  Seasonal order: (0, 0, 2, 12)
SARIMA order: (0, 1, 3)  x  Seasonal order: (0, 0, 3, 12)
SARIMA order: (1, 1, 0)  x  Seasonal order: (1, 0, 0, 12)
SARIMA order: (1, 1, 1)  x  Seasonal order: (1, 0, 1, 12)
SARIMA order: (1, 1, 2)  x  Seasonal order: (1, 0, 2, 12)
SARIMA order: (1, 1, 3)  x  Seasonal order: (1, 0, 3, 12)
SARIMA order: (2, 1, 0)  x  Seasonal order: (2, 0, 0, 12)
SARIMA order: (2, 1, 1)  x  Seasonal order: (2, 0, 1, 12)
In [448]:
# ✅ Create an empty DataFrame to store SARIMA model performance
# Columns include:
# param    → (p,d,q) non-seasonal parameters
# seasonal → (P,D,Q,s) seasonal parameters (s = seasonal period)
# AIC      → Akaike Information Criterion (lower = better model fit)
SARIMA_AIC = pd.DataFrame(columns=['param', 'seasonal', 'AIC'])

# Display the empty tracking table
SARIMA_AIC
Out[448]:
param seasonal AIC
In [449]:
import statsmodels.api as sm
import warnings
warnings.filterwarnings("ignore")  # Suppress convergence warnings

# ✅ Loop through all (p,d,q) and seasonal (P,D,Q,s) combinations
for param in pdq:
    for param_seasonal in PDQ:
        try:
            # Build SARIMA model
            SARIMA_model = sm.tsa.statespace.SARIMAX(
                df_train['Sparkling'],
                order=param,
                seasonal_order=param_seasonal,
                enforce_stationarity=False,
                enforce_invertibility=False
            )

            # Fit model (maxiter increases chance of convergence)
            results_SARIMA = SARIMA_model.fit(maxiter=1000)

            # Print progress
            print(f"SARIMA{param} x {param_seasonal} → AIC = {results_SARIMA.aic:.2f}")

            # ✅ Save results in the AIC comparison table
            SARIMA_AIC.loc[len(SARIMA_AIC)] = [param, param_seasonal, results_SARIMA.aic]

        except Exception as e:
            # Prevent loop crash if a model fails to converge
            print(f"SARIMA{param} x {param_seasonal} → ❌ Failed ({str(e)[:50]})")
            continue
SARIMA(0, 1, 0) x (0, 0, 0, 12) → AIC = 2460.44
SARIMA(0, 1, 0) x (0, 0, 1, 12) → AIC = 2153.45
SARIMA(0, 1, 0) x (0, 0, 2, 12) → AIC = 1913.33
SARIMA(0, 1, 0) x (0, 0, 3, 12) → AIC = 3893.93
SARIMA(0, 1, 0) x (1, 0, 0, 12) → AIC = 2019.72
SARIMA(0, 1, 0) x (1, 0, 1, 12) → AIC = 1989.78
SARIMA(0, 1, 0) x (1, 0, 2, 12) → AIC = 1814.41
SARIMA(0, 1, 0) x (1, 0, 3, 12) → AIC = 3400.80
SARIMA(0, 1, 0) x (2, 0, 0, 12) → AIC = 1830.01
SARIMA(0, 1, 0) x (2, 0, 1, 12) → AIC = 1830.00
SARIMA(0, 1, 0) x (2, 0, 2, 12) → AIC = 1815.80
SARIMA(0, 1, 0) x (2, 0, 3, 12) → AIC = 3929.08
SARIMA(0, 1, 0) x (3, 0, 0, 12) → AIC = 1651.61
SARIMA(0, 1, 0) x (3, 0, 1, 12) → AIC = 1653.48
SARIMA(0, 1, 0) x (3, 0, 2, 12) → AIC = 1654.87
SARIMA(0, 1, 0) x (3, 0, 3, 12) → AIC = 3403.64
SARIMA(0, 1, 1) x (0, 0, 0, 12) → AIC = 2437.95
SARIMA(0, 1, 1) x (0, 0, 1, 12) → AIC = 2122.82
SARIMA(0, 1, 1) x (0, 0, 2, 12) → AIC = 1879.93
SARIMA(0, 1, 1) x (0, 0, 3, 12) → AIC = 4581.26
SARIMA(0, 1, 1) x (1, 0, 0, 12) → AIC = 1972.08
SARIMA(0, 1, 1) x (1, 0, 1, 12) → AIC = 1917.64
SARIMA(0, 1, 1) x (1, 0, 2, 12) → AIC = 1749.38
SARIMA(0, 1, 1) x (1, 0, 3, 12) → AIC = 3432.61
SARIMA(0, 1, 1) x (2, 0, 0, 12) → AIC = 1782.29
SARIMA(0, 1, 1) x (2, 0, 1, 12) → AIC = 1778.61
SARIMA(0, 1, 1) x (2, 0, 2, 12) → AIC = 1749.81
SARIMA(0, 1, 1) x (2, 0, 3, 12) → AIC = 3399.29
SARIMA(0, 1, 1) x (3, 0, 0, 12) → AIC = 1606.80
SARIMA(0, 1, 1) x (3, 0, 1, 12) → AIC = 1608.41
SARIMA(0, 1, 1) x (3, 0, 2, 12) → AIC = 1609.48
SARIMA(0, 1, 1) x (3, 0, 3, 12) → AIC = 3417.83
SARIMA(0, 1, 2) x (0, 0, 0, 12) → AIC = 2392.74
SARIMA(0, 1, 2) x (0, 0, 1, 12) → AIC = 2086.23
SARIMA(0, 1, 2) x (0, 0, 2, 12) → AIC = 1847.17
SARIMA(0, 1, 2) x (0, 0, 3, 12) → AIC = 2888.92
SARIMA(0, 1, 2) x (1, 0, 0, 12) → AIC = 1964.42
SARIMA(0, 1, 2) x (1, 0, 1, 12) → AIC = 1901.33
SARIMA(0, 1, 2) x (1, 0, 2, 12) → AIC = 1733.98
SARIMA(0, 1, 2) x (1, 0, 3, 12) → AIC = 4110.47
SARIMA(0, 1, 2) x (2, 0, 0, 12) → AIC = 1777.82
SARIMA(0, 1, 2) x (2, 0, 1, 12) → AIC = 1777.17
SARIMA(0, 1, 2) x (2, 0, 2, 12) → AIC = 1734.45
SARIMA(0, 1, 2) x (2, 0, 3, 12) → AIC = 4056.90
SARIMA(0, 1, 2) x (3, 0, 0, 12) → AIC = 1604.49
SARIMA(0, 1, 2) x (3, 0, 1, 12) → AIC = 1606.35
SARIMA(0, 1, 2) x (3, 0, 2, 12) → AIC = 1607.99
SARIMA(0, 1, 2) x (3, 0, 3, 12) → AIC = 4057.03
SARIMA(0, 1, 3) x (0, 0, 0, 12) → AIC = 2374.68
SARIMA(0, 1, 3) x (0, 0, 1, 12) → AIC = 2072.04
SARIMA(0, 1, 3) x (0, 0, 2, 12) → AIC = 1831.95
SARIMA(0, 1, 3) x (0, 0, 3, 12) → AIC = 4082.66
SARIMA(0, 1, 3) x (1, 0, 0, 12) → AIC = 1966.34
SARIMA(0, 1, 3) x (1, 0, 1, 12) → AIC = 1887.32
SARIMA(0, 1, 3) x (1, 0, 2, 12) → AIC = 1719.12
SARIMA(0, 1, 3) x (1, 0, 3, 12) → AIC = 3522.85
SARIMA(0, 1, 3) x (2, 0, 0, 12) → AIC = 1778.50
SARIMA(0, 1, 3) x (2, 0, 1, 12) → AIC = 1777.58
SARIMA(0, 1, 3) x (2, 0, 2, 12) → AIC = 1719.95
SARIMA(0, 1, 3) x (2, 0, 3, 12) → AIC = 3777.39
SARIMA(0, 1, 3) x (3, 0, 0, 12) → AIC = 1604.84
SARIMA(0, 1, 3) x (3, 0, 1, 12) → AIC = 1615.14
SARIMA(0, 1, 3) x (3, 0, 2, 12) → AIC = 1607.90
SARIMA(0, 1, 3) x (3, 0, 3, 12) → AIC = 3858.20
SARIMA(1, 1, 0) x (0, 0, 0, 12) → AIC = 2458.62
SARIMA(1, 1, 0) x (0, 0, 1, 12) → AIC = 2151.91
SARIMA(1, 1, 0) x (0, 0, 2, 12) → AIC = 1910.75
SARIMA(1, 1, 0) x (0, 0, 3, 12) → AIC = 3445.34
SARIMA(1, 1, 0) x (1, 0, 0, 12) → AIC = 1992.39
SARIMA(1, 1, 0) x (1, 0, 1, 12) → AIC = 1972.80
SARIMA(1, 1, 0) x (1, 0, 2, 12) → AIC = 1800.78
SARIMA(1, 1, 0) x (1, 0, 3, 12) → AIC = 4164.39
SARIMA(1, 1, 0) x (2, 0, 0, 12) → AIC = 1803.45
SARIMA(1, 1, 0) x (2, 0, 1, 12) → AIC = 1801.44
SARIMA(1, 1, 0) x (2, 0, 2, 12) → AIC = 1801.13
SARIMA(1, 1, 0) x (2, 0, 3, 12) → AIC = 4129.92
SARIMA(1, 1, 0) x (3, 0, 0, 12) → AIC = 1625.52
SARIMA(1, 1, 0) x (3, 0, 1, 12) → AIC = 1627.42
SARIMA(1, 1, 0) x (3, 0, 2, 12) → AIC = 1628.97
SARIMA(1, 1, 0) x (3, 0, 3, 12) → AIC = 4148.10
SARIMA(1, 1, 1) x (0, 0, 0, 12) → AIC = 2410.09
SARIMA(1, 1, 1) x (0, 0, 1, 12) → AIC = 2104.01
SARIMA(1, 1, 1) x (0, 0, 2, 12) → AIC = 1864.14
SARIMA(1, 1, 1) x (0, 0, 3, 12) → AIC = 12.00
SARIMA(1, 1, 1) x (1, 0, 0, 12) → AIC = 1949.81
SARIMA(1, 1, 1) x (1, 0, 1, 12) → AIC = 1917.69
SARIMA(1, 1, 1) x (1, 0, 2, 12) → AIC = 1748.91
SARIMA(1, 1, 1) x (1, 0, 3, 12) → AIC = 3677.74
SARIMA(1, 1, 1) x (2, 0, 0, 12) → AIC = 1765.44
SARIMA(1, 1, 1) x (2, 0, 1, 12) → AIC = 1821.52
SARIMA(1, 1, 1) x (2, 0, 2, 12) → AIC = 1749.65
SARIMA(1, 1, 1) x (2, 0, 3, 12) → AIC = 3568.12
SARIMA(1, 1, 1) x (3, 0, 0, 12) → AIC = 1591.61
SARIMA(1, 1, 1) x (3, 0, 1, 12) → AIC = 1593.44
SARIMA(1, 1, 1) x (3, 0, 2, 12) → AIC = 1595.09
SARIMA(1, 1, 1) x (3, 0, 3, 12) → AIC = 3571.64
SARIMA(1, 1, 2) x (0, 0, 0, 12) → AIC = 2389.71
SARIMA(1, 1, 2) x (0, 0, 1, 12) → AIC = 2087.93
SARIMA(1, 1, 2) x (0, 0, 2, 12) → AIC = 1846.45
SARIMA(1, 1, 2) x (0, 0, 3, 12) → AIC = 3791.72
SARIMA(1, 1, 2) x (1, 0, 0, 12) → AIC = 1945.43
SARIMA(1, 1, 2) x (1, 0, 1, 12) → AIC = 1900.84
SARIMA(1, 1, 2) x (1, 0, 2, 12) → AIC = 1732.03
SARIMA(1, 1, 2) x (1, 0, 3, 12) → AIC = 3727.07
SARIMA(1, 1, 2) x (2, 0, 0, 12) → AIC = 1762.33
SARIMA(1, 1, 2) x (2, 0, 1, 12) → AIC = 1761.83
SARIMA(1, 1, 2) x (2, 0, 2, 12) → AIC = 1732.95
SARIMA(1, 1, 2) x (2, 0, 3, 12) → AIC = 3493.14
SARIMA(1, 1, 2) x (3, 0, 0, 12) → AIC = 1589.31
SARIMA(1, 1, 2) x (3, 0, 1, 12) → AIC = 1591.10
SARIMA(1, 1, 2) x (3, 0, 2, 12) → AIC = 1592.94
SARIMA(1, 1, 2) x (3, 0, 3, 12) → AIC = 3427.93
SARIMA(1, 1, 3) x (0, 0, 0, 12) → AIC = 2374.98
SARIMA(1, 1, 3) x (0, 0, 1, 12) → AIC = 2069.10
SARIMA(1, 1, 3) x (0, 0, 2, 12) → AIC = 1827.84
SARIMA(1, 1, 3) x (0, 0, 3, 12) → AIC = 343.25
SARIMA(1, 1, 3) x (1, 0, 0, 12) → AIC = 1947.46
SARIMA(1, 1, 3) x (1, 0, 1, 12) → AIC = 1888.86
SARIMA(1, 1, 3) x (1, 0, 2, 12) → AIC = 1719.19
SARIMA(1, 1, 3) x (1, 0, 3, 12) → AIC = 4050.89
SARIMA(1, 1, 3) x (2, 0, 0, 12) → AIC = 1764.26
SARIMA(1, 1, 3) x (2, 0, 1, 12) → AIC = 1763.77
SARIMA(1, 1, 3) x (2, 0, 2, 12) → AIC = 1720.32
SARIMA(1, 1, 3) x (2, 0, 3, 12) → AIC = 3740.28
SARIMA(1, 1, 3) x (3, 0, 0, 12) → AIC = 1591.31
SARIMA(1, 1, 3) x (3, 0, 1, 12) → AIC = 1595.51
SARIMA(1, 1, 3) x (3, 0, 2, 12) → AIC = 1594.98
SARIMA(1, 1, 3) x (3, 0, 3, 12) → AIC = 3359.37
SARIMA(2, 1, 0) x (0, 0, 0, 12) → AIC = 2435.65
SARIMA(2, 1, 0) x (0, 0, 1, 12) → AIC = 2144.87
SARIMA(2, 1, 0) x (0, 0, 2, 12) → AIC = 1901.48
SARIMA(2, 1, 0) x (0, 0, 3, 12) → AIC = 4199.17
SARIMA(2, 1, 0) x (1, 0, 0, 12) → AIC = 1961.62
SARIMA(2, 1, 0) x (1, 0, 1, 12) → AIC = 1941.61
SARIMA(2, 1, 0) x (1, 0, 2, 12) → AIC = 1784.97
SARIMA(2, 1, 0) x (1, 0, 3, 12) → AIC = 3612.58
SARIMA(2, 1, 0) x (2, 0, 0, 12) → AIC = 1774.11
SARIMA(2, 1, 0) x (2, 0, 1, 12) → AIC = 1772.34
SARIMA(2, 1, 0) x (2, 0, 2, 12) → AIC = 1770.76
SARIMA(2, 1, 0) x (2, 0, 3, 12) → AIC = 3068.95
SARIMA(2, 1, 0) x (3, 0, 0, 12) → AIC = 1596.95
SARIMA(2, 1, 0) x (3, 0, 1, 12) → AIC = 1598.62
SARIMA(2, 1, 0) x (3, 0, 2, 12) → AIC = 1599.19
SARIMA(2, 1, 0) x (3, 0, 3, 12) → AIC = 3548.30
SARIMA(2, 1, 1) x (0, 0, 0, 12) → AIC = 2408.43
SARIMA(2, 1, 1) x (0, 0, 1, 12) → AIC = 2103.53
SARIMA(2, 1, 1) x (0, 0, 2, 12) → AIC = 1862.25
SARIMA(2, 1, 1) x (0, 0, 3, 12) → AIC = 2743.57
SARIMA(2, 1, 1) x (1, 0, 0, 12) → AIC = 1973.64
SARIMA(2, 1, 1) x (1, 0, 1, 12) → AIC = 1917.21
SARIMA(2, 1, 1) x (1, 0, 2, 12) → AIC = 1748.75
SARIMA(2, 1, 1) x (1, 0, 3, 12) → AIC = 3068.07
SARIMA(2, 1, 1) x (2, 0, 0, 12) → AIC = 1751.70
SARIMA(2, 1, 1) x (2, 0, 1, 12) → AIC = 1750.03
SARIMA(2, 1, 1) x (2, 0, 2, 12) → AIC = 1749.38
SARIMA(2, 1, 1) x (2, 0, 3, 12) → AIC = 3101.90
SARIMA(2, 1, 1) x (3, 0, 0, 12) → AIC = 1578.22
SARIMA(2, 1, 1) x (3, 0, 1, 12) → AIC = 1580.21
SARIMA(2, 1, 1) x (3, 0, 2, 12) → AIC = 1601.18
SARIMA(2, 1, 1) x (3, 0, 3, 12) → AIC = 3964.28
SARIMA(2, 1, 2) x (0, 0, 0, 12) → AIC = 2370.98
SARIMA(2, 1, 2) x (0, 0, 1, 12) → AIC = 2088.18
SARIMA(2, 1, 2) x (0, 0, 2, 12) → AIC = 1848.44
SARIMA(2, 1, 2) x (0, 0, 3, 12) → AIC = 3697.94
SARIMA(2, 1, 2) x (1, 0, 0, 12) → AIC = 1931.14
SARIMA(2, 1, 2) x (1, 0, 1, 12) → AIC = 1902.82
SARIMA(2, 1, 2) x (1, 0, 2, 12) → AIC = 1734.03
SARIMA(2, 1, 2) x (1, 0, 3, 12) → AIC = 3901.77
SARIMA(2, 1, 2) x (2, 0, 0, 12) → AIC = 1750.24
SARIMA(2, 1, 2) x (2, 0, 1, 12) → AIC = 1750.02
SARIMA(2, 1, 2) x (2, 0, 2, 12) → AIC = 1737.61
SARIMA(2, 1, 2) x (2, 0, 3, 12) → AIC = 3828.52
SARIMA(2, 1, 2) x (3, 0, 0, 12) → AIC = 1577.10
SARIMA(2, 1, 2) x (3, 0, 1, 12) → AIC = 1578.93
SARIMA(2, 1, 2) x (3, 0, 2, 12) → AIC = 1580.70
SARIMA(2, 1, 2) x (3, 0, 3, 12) → AIC = 3561.93
SARIMA(2, 1, 3) x (0, 0, 0, 12) → AIC = 2377.96
SARIMA(2, 1, 3) x (0, 0, 1, 12) → AIC = 2060.15
SARIMA(2, 1, 3) x (0, 0, 2, 12) → AIC = 1823.64
SARIMA(2, 1, 3) x (0, 0, 3, 12) → AIC = 3298.59
SARIMA(2, 1, 3) x (1, 0, 0, 12) → AIC = 1930.93
SARIMA(2, 1, 3) x (1, 0, 1, 12) → AIC = 1890.95
SARIMA(2, 1, 3) x (1, 0, 2, 12) → AIC = 1717.16
SARIMA(2, 1, 3) x (1, 0, 3, 12) → AIC = 4154.31
SARIMA(2, 1, 3) x (2, 0, 0, 12) → AIC = 1751.95
SARIMA(2, 1, 3) x (2, 0, 1, 12) → AIC = 1747.48
SARIMA(2, 1, 3) x (2, 0, 2, 12) → AIC = 1733.98
SARIMA(2, 1, 3) x (2, 0, 3, 12) → AIC = 3600.29
SARIMA(2, 1, 3) x (3, 0, 0, 12) → AIC = 1578.85
SARIMA(2, 1, 3) x (3, 0, 1, 12) → AIC = 1580.68
SARIMA(2, 1, 3) x (3, 0, 2, 12) → AIC = 1582.19
SARIMA(2, 1, 3) x (3, 0, 3, 12) → AIC = 3004.08
SARIMA(3, 1, 0) x (0, 0, 0, 12) → AIC = 2417.01
SARIMA(3, 1, 0) x (0, 0, 1, 12) → AIC = 2144.79
SARIMA(3, 1, 0) x (0, 0, 2, 12) → AIC = 1899.15
SARIMA(3, 1, 0) x (0, 0, 3, 12) → AIC = 4586.68
SARIMA(3, 1, 0) x (1, 0, 0, 12) → AIC = 1943.05
SARIMA(3, 1, 0) x (1, 0, 1, 12) → AIC = 1924.15
SARIMA(3, 1, 0) x (1, 0, 2, 12) → AIC = 1783.33
SARIMA(3, 1, 0) x (1, 0, 3, 12) → AIC = 4018.13
SARIMA(3, 1, 0) x (2, 0, 0, 12) → AIC = 1759.51
SARIMA(3, 1, 0) x (2, 0, 1, 12) → AIC = 1756.89
SARIMA(3, 1, 0) x (2, 0, 2, 12) → AIC = 1755.54
SARIMA(3, 1, 0) x (2, 0, 3, 12) → AIC = 3862.65
SARIMA(3, 1, 0) x (3, 0, 0, 12) → AIC = 1580.85
SARIMA(3, 1, 0) x (3, 0, 1, 12) → AIC = 1582.24
SARIMA(3, 1, 0) x (3, 0, 2, 12) → AIC = 1582.43
SARIMA(3, 1, 0) x (3, 0, 3, 12) → AIC = 3520.15
SARIMA(3, 1, 1) x (0, 0, 0, 12) → AIC = 2390.31
SARIMA(3, 1, 1) x (0, 0, 1, 12) → AIC = 2105.43
SARIMA(3, 1, 1) x (0, 0, 2, 12) → AIC = 1864.20
SARIMA(3, 1, 1) x (0, 0, 3, 12) → AIC = 3711.77
SARIMA(3, 1, 1) x (1, 0, 0, 12) → AIC = 1920.57
SARIMA(3, 1, 1) x (1, 0, 1, 12) → AIC = 1904.43
SARIMA(3, 1, 1) x (1, 0, 2, 12) → AIC = 1750.58
SARIMA(3, 1, 1) x (1, 0, 3, 12) → AIC = 3557.50
SARIMA(3, 1, 1) x (2, 0, 0, 12) → AIC = 1739.44
SARIMA(3, 1, 1) x (2, 0, 1, 12) → AIC = 1737.97
SARIMA(3, 1, 1) x (2, 0, 2, 12) → AIC = 1761.06
SARIMA(3, 1, 1) x (2, 0, 3, 12) → AIC = 3103.90
SARIMA(3, 1, 1) x (3, 0, 0, 12) → AIC = 1563.91
SARIMA(3, 1, 1) x (3, 0, 1, 12) → AIC = 1565.56
SARIMA(3, 1, 1) x (3, 0, 2, 12) → AIC = 1566.58
SARIMA(3, 1, 1) x (3, 0, 3, 12) → AIC = 3762.19
SARIMA(3, 1, 2) x (0, 0, 0, 12) → AIC = 2392.19
SARIMA(3, 1, 2) x (0, 0, 1, 12) → AIC = 2085.22
SARIMA(3, 1, 2) x (0, 0, 2, 12) → AIC = 1845.68
SARIMA(3, 1, 2) x (0, 0, 3, 12) → AIC = 2365.36
SARIMA(3, 1, 2) x (1, 0, 0, 12) → AIC = 1960.39
SARIMA(3, 1, 2) x (1, 0, 1, 12) → AIC = 1904.81
SARIMA(3, 1, 2) x (1, 0, 2, 12) → AIC = 1736.00
SARIMA(3, 1, 2) x (1, 0, 3, 12) → AIC = 2391.01
SARIMA(3, 1, 2) x (2, 0, 0, 12) → AIC = 1738.32
SARIMA(3, 1, 2) x (2, 0, 1, 12) → AIC = 1737.25
SARIMA(3, 1, 2) x (2, 0, 2, 12) → AIC = 1736.93
SARIMA(3, 1, 2) x (2, 0, 3, 12) → AIC = 2306.62
SARIMA(3, 1, 2) x (3, 0, 0, 12) → AIC = 1563.14
SARIMA(3, 1, 2) x (3, 0, 1, 12) → AIC = 1567.57
SARIMA(3, 1, 2) x (3, 0, 2, 12) → AIC = 1566.79
SARIMA(3, 1, 2) x (3, 0, 3, 12) → AIC = 2312.64
SARIMA(3, 1, 3) x (0, 0, 0, 12) → AIC = 2354.17
SARIMA(3, 1, 3) x (0, 0, 1, 12) → AIC = 2074.24
SARIMA(3, 1, 3) x (0, 0, 2, 12) → AIC = 1835.03
SARIMA(3, 1, 3) x (0, 0, 3, 12) → AIC = 8068.02
SARIMA(3, 1, 3) x (1, 0, 0, 12) → AIC = 1914.97
SARIMA(3, 1, 3) x (1, 0, 1, 12) → AIC = 1892.77
SARIMA(3, 1, 3) x (1, 0, 2, 12) → AIC = 1722.21
SARIMA(3, 1, 3) x (1, 0, 3, 12) → AIC = 1764.21
SARIMA(3, 1, 3) x (2, 0, 0, 12) → AIC = 1739.98
SARIMA(3, 1, 3) x (2, 0, 1, 12) → AIC = 1735.67
SARIMA(3, 1, 3) x (2, 0, 2, 12) → AIC = 1729.00
SARIMA(3, 1, 3) x (2, 0, 3, 12) → AIC = 1849.69
SARIMA(3, 1, 3) x (3, 0, 0, 12) → AIC = 1565.11
SARIMA(3, 1, 3) x (3, 0, 1, 12) → AIC = 1567.01
SARIMA(3, 1, 3) x (3, 0, 2, 12) → AIC = 1568.79
SARIMA(3, 1, 3) x (3, 0, 3, 12) → AIC = 109.39
In [450]:
# ✅ Sort SARIMA results by AIC in ascending order (lowest AIC = best model fit)
SARIMA_AIC_sorted = SARIMA_AIC.sort_values(by='AIC', ascending=True).reset_index(drop=True)

print("📊 Top SARIMA(p,d,q) × (P,D,Q,12) models ranked by lowest AIC:")
display(SARIMA_AIC_sorted.head().style.format({'AIC': '{:,.2f}'}))
📊 Top SARIMA(p,d,q) × (P,D,Q,12) models ranked by lowest AIC:
  param seasonal AIC
0 (1, 1, 1) (0, 0, 3, 12) 12.00
1 (3, 1, 3) (3, 0, 3, 12) 109.39
2 (1, 1, 3) (0, 0, 3, 12) 343.25
3 (3, 1, 2) (3, 0, 0, 12) 1,563.14
4 (3, 1, 1) (3, 0, 0, 12) 1,563.91
In [451]:
import statsmodels.api as sm

# ✅ Train the Best SARIMA Model based on AIC ranking
# order = (3,1,2) → Non-seasonal ARIMA(p,d,q)
# seasonal_order = (1,0,3,12) → Seasonal (P,D,Q,s) with s=12 (monthly seasonality)
auto_SARIMA = sm.tsa.statespace.SARIMAX(
    df_train['Sparkling'],
    order=(3,1,2),
    seasonal_order=(1,0,3,12),
    enforce_stationarity=False,   # Allow model flexibility even if not stationary
    enforce_invertibility=False   # Prevent convergence errors
)

# ✅ Fit the model; maxiter increased to allow proper convergence on complex seasonal models
results_auto_SARIMA = auto_SARIMA.fit(maxiter=1000)

# ✅ Display detailed model summary including AIC, BIC, coefficients, and diagnostics
print("📌 SARIMA(3,1,2) × (1,0,3,12) Model Summary:\n")
print(results_auto_SARIMA.summary())
📌 SARIMA(3,1,2) × (1,0,3,12) Model Summary:

                                         SARIMAX Results                                          
==================================================================================================
Dep. Variable:                                  Sparkling   No. Observations:                  144
Model:             SARIMAX(3, 1, 2)x(1, 0, [1, 2, 3], 12)   Log Likelihood               -1185.506
Date:                                    Sun, 02 Nov 2025   AIC                           2391.011
Time:                                            14:00:13   BIC                           2417.455
Sample:                                        01-31-1980   HQIC                          2401.725
                                             - 12-31-1991                                         
Covariance Type:                                      opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1         -1.0985     77.217     -0.014      0.989    -152.440     150.243
ar.L2          0.2057   2171.193   9.47e-05      1.000   -4255.254    4255.666
ar.L3         -0.2747     48.280     -0.006      0.995     -94.902      94.353
ma.L1          0.8415      0.005    162.681      0.000       0.831       0.852
ma.L2         -0.7274     10.834     -0.067      0.946     -21.962      20.507
ar.S.L12       0.9089     51.078      0.018      0.986     -99.201     101.019
ma.S.L12    -1.26e+13    3.1e-06  -4.06e+18      0.000   -1.26e+13   -1.26e+13
ma.S.L24    1.397e+13    2.8e-11   4.99e+23      0.000     1.4e+13     1.4e+13
ma.S.L36    2.074e+12   2.06e-12   1.01e+24      0.000    2.07e+12    2.07e+12
sigma2      1.779e+06      2.570   6.92e+05      0.000    1.78e+06    1.78e+06
===================================================================================
Ljung-Box (L1) (Q):                   9.88   Jarque-Bera (JB):             29215.95
Prob(Q):                              0.00   Prob(JB):                         0.00
Heteroskedasticity (H):               0.00   Skew:                             8.44
Prob(H) (two-sided):                  0.00   Kurtosis:                        83.36
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 2.24e+46. Standard errors may be unstable.
In [452]:
# ✅ Residual diagnostics for SARIMA model
# This checks whether the model errors behave like white noise
results_auto_SARIMA.plot_diagnostics(figsize=(14, 10))

plt.suptitle("SARIMA Model Residual Diagnostics", fontsize=14, y=1.02)
plt.show()
No description has been provided for this image
In [453]:
# ✅ Forecasting the same number of future periods as the Test dataset
# get_forecast() produces point forecasts + prediction intervals
predicted_auto_SARIMA = results_auto_SARIMA.get_forecast(steps=len(df_test))
In [454]:
# ✅ Convert SARIMA forecast results into a summary DataFrame
# summary_frame(alpha=0.05) returns:
# - mean          → Point forecast
# - mean_se       → Standard error of the mean forecast
# - mean_ci_lower → Lower bound of 95% confidence interval
# - mean_ci_upper → Upper bound of 95% confidence interval
forecast_df = predicted_auto_SARIMA.summary_frame(alpha=0.05)

# Show the first few forecasted values
forecast_df.head()
Out[454]:
Sparkling mean mean_se mean_ci_lower mean_ci_upper
1992-01-31 -5.604998e+22 NaN NaN NaN
1992-02-29 7.785341e+22 NaN NaN NaN
1992-03-31 -1.081384e+23 NaN NaN NaN
1992-04-30 1.502042e+23 NaN NaN NaN
1992-05-31 -2.086336e+23 NaN NaN NaN
In [455]:
# ✅ RMSE = sqrt(MSE)
rmse = np.sqrt(mean_squared_error(df_test['Sparkling'], predicted_auto_SARIMA.predicted_mean))

# ✅ MAPE using our custom function
mape = mean_absolute_percentage_error(df_test['Sparkling'], predicted_auto_SARIMA.predicted_mean)

# ✅ Nicely formatted output
print(f"📌 SARIMA Model Performance on Test Data:")
print(f"   ➤ RMSE : {rmse:,.2f}")
print(f"   ➤ MAPE : {mape:.2f}%")
📌 SARIMA Model Performance on Test Data:
   ➤ RMSE : 12,132,559,640,424,642,843,221,426,176.00
   ➤ MAPE : 251533539866592360997584896.00%
In [456]:
# ✅ Create a temporary dataframe to store SARIMA model performance
# - RMSE: Root Mean Squared Error (accuracy in unit scale)
# - MAPE: Mean Absolute Percentage Error (accuracy in %)
# Index is named after the specific SARIMA model tested
temp_resultsDf = pd.DataFrame(
    {'RMSE': rmse, 'MAPE': mape},
    index=['SARIMA(3,1,2)(1,0,3,12)']  # ✅ Updated to match model order: (3,1,2)
)

# ✅ Append SARIMA results to the existing model performance table (resultsDf)
# Ensures all models can be compared in one consolidated table
resultsDf = pd.concat([resultsDf, temp_resultsDf])

# ✅ Display updated table including ARIMA and SARIMA performance
resultsDf
Out[456]:
RMSE MAPE
ARIMA(2,1,2) 1.309631e+03 4.210620e+01
SARIMA(3,1,2)(1,0,3,12) 1.213256e+28 2.515335e+26
In [457]:
resultsDf2 = resultsDf.sort_values(by='RMSE', ascending=True)

print("📊 Model Performance Ranking (Lowest RMSE = Best):")
display(resultsDf2.style.format({'RMSE': '{:,.2f}', 'MAPE': '{:.2f}%'}))
📊 Model Performance Ranking (Lowest RMSE = Best):
  RMSE MAPE
ARIMA(2,1,2) 1,309.63 42.11%
SARIMA(3,1,2)(1,0,3,12) 12,132,559,640,424,642,843,221,426,176.00 251533539866592360997584896.00%
In [458]:
# ✅ Extract only the RMSE column from the sorted results table
# resultsDf2 already contains performance results for all models
resultsDf3 = resultsDf2[['RMSE']].copy()  # copy() helps avoid SettingWithCopy warnings

# ✅ Rename the column for clarity in report tables
# 'Test RMSE' clearly indicates performance measured on the test dataset
resultsDf3.rename(columns={'RMSE': 'Test RMSE'}, inplace=True)

# ✅ Display the final performance comparison table (RMSE only)
resultsDf3
Out[458]:
Test RMSE
ARIMA(2,1,2) 1.309631e+03
SARIMA(3,1,2)(1,0,3,12) 1.213256e+28
In [459]:
# ✅ Keep only the RMSE column for comparison table
resultsDf3 = resultsDf2[['RMSE']].copy()  # copy() prevents SettingWithCopyWarning

# ✅ Rename the column for clear understanding in final report
resultsDf3.rename(columns={'RMSE': 'Test RMSE'}, inplace=True)

# ✅ Display final table of model performance (RMSE only)
display(resultsDf3.style.format({'Test RMSE': '{:,.2f}'}))
  Test RMSE
ARIMA(2,1,2) 1,309.63
SARIMA(3,1,2)(1,0,3,12) 12,132,559,640,424,642,843,221,426,176.00

Observations:

  • Coefficient instability warnings

  • Large standard errors with weak parameter significance

  • Residuals exhibit abnormal spikes

  • QQ plot strongly deviates and not normally distributed

  • Histogram is unrealistic

  • Suggests model overfitting / poor convergence

  • Seasonal structure not modeled correctly, or wrong parameter search

  • The SARIMA model tested had unstable residuals with extremely high error values, which means the seasonal parameters used did not suit the data and the model is not appropriate in its current configuration.

Model Comparison and Final Model Selection¶

Rebuild best model on entire data and make next 12 month forecast

In [460]:
comparison_model_table = pd.concat([resultsDf1, resultsDf3])
comparison_model_table.sort_values(by = 'Test RMSE')
Out[460]:
Test RMSE
TES (α=0.50, β=0.30, γ=0.30) 4.126138e+02
SimpleAverageModel 1.268683e+03
SES_Model 1.293814e+03
ARIMA(2,1,2) 1.309631e+03
SES (Alpha = 0.30) 1.900059e+03
DES_Model 2.654534e+03
DES (Alpha=0.30, Beta=0.30) 1.381440e+04
Regression On Time Model 1.840430e+06
NaiveForecast 1.583972e+07
SARIMA(3,1,2)(1,0,3,12) 1.213256e+28
In [461]:
# ✅ Combine model performance results
comparison_model_table = pd.concat([resultsDf1, resultsDf3])

# ✅ Sort models by Test RMSE (Lowest RMSE = Best Performance)
comparison_model_table = comparison_model_table.sort_values(by='Test RMSE', ascending=True)

# ✅ Display formatted table
print("📊 Model Comparison and Final Model Selection (Ranked by Test RMSE)")
display(comparison_model_table.style.format({'Test RMSE': '{:,.2f}'}))
📊 Model Comparison and Final Model Selection (Ranked by Test RMSE)
  Test RMSE
TES (α=0.50, β=0.30, γ=0.30) 412.61
SimpleAverageModel 1,268.68
SES_Model 1,293.81
ARIMA(2,1,2) 1,309.63
SES (Alpha = 0.30) 1,900.06
DES_Model 2,654.53
DES (Alpha=0.30, Beta=0.30) 13,814.40
Regression On Time Model 1,840,430.14
NaiveForecast 15,839,720.95
SARIMA(3,1,2)(1,0,3,12) 12,132,559,640,424,642,843,221,426,176.00

Observations:

  • Among all the evaluated methods for forecasting, the Triple Exponential Smoothing model (Holt Winters) with alpha=0.50, beta=0.30, gamma=0.30 yielded the lowest RMSE (approx. 412.61), turning it into the best performing model to forecast Sparkling Wine sales.

  • The Naive and SARIMA models basically failed, while ARIMA and the other smoothing models gave moderate results but did not match the performance of the tuned Holt Winters model.

Forecasting using Final Model¶

In [462]:
model = ExponentialSmoothing(df['Sparkling'],trend='additive',seasonal='multiplicative',freq='M')
# fit model
model_fit = model.fit(smoothing_level=0.3,smoothing_trend=0.3,smoothing_seasonal=0.3,optimized=False,use_brute=True)
# make prediction
yhat = model_fit.predict(start='31-08-1995',end='31-08-1996')
In [463]:
plt.figure(figsize=(18,9))  # Large figure for presentation quality

# ✅ Plot the original Sparkling sales data
plt.plot(df['Sparkling'], label='Actual Sales Data', linewidth=2)

# ✅ Plot model predictions (future forecast)
plt.plot(yhat,
         label='TES Forecast (α = β = γ = 0.3)',
         linestyle='--',
         linewidth=2)

# ✅ Visualization enhancements
plt.title("Triple Exponential Smoothing (Holt-Winters) Forecast", fontsize=16)
plt.xlabel("Time", fontsize=14)
plt.ylabel("Sparkling Wine Sales", fontsize=14)
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(loc='best', fontsize=12)
plt.tight_layout()

plt.show()
No description has been provided for this image
  • In the below code, we have calculated the upper and lower confidence bands at 95% confidence level
  • Here we are taking the multiplier to be 1.96 as we want to plot with respect to a 95% confidence intervals.
In [464]:
# ✅ Create a forecast results DataFrame including confidence intervals

# Standard deviation of residuals → used for CI width
resid_std = np.std(model_fit.resid, ddof=1)

pred_1_df = pd.DataFrame({
    'lower_CI': yhat - 1.96 * resid_std,   # Lower 95% confidence interval
    'prediction': yhat,                    # Forecasted values
    'upper_CI': yhat + 1.96 * resid_std    # Upper 95% confidence interval
})

# ✅ Display the first few rows of forecast with CI
pred_1_df.head()
Out[464]:
lower_CI prediction upper_CI
1995-08-31 1014.069331 1854.686308 2695.303286
1995-09-30 1650.369143 2490.986121 3331.603098
1995-10-31 2498.588142 3339.205119 4179.822096
1995-11-30 3415.513796 4256.130773 5096.747751
1995-12-31 6048.795192 6889.412170 7730.029147
In [465]:
pred_1_df.tail()
Out[465]:
lower_CI prediction upper_CI
1996-04-30 1573.333217 2413.950194 3254.567172
1996-05-31 1335.714857 2176.331834 3016.948811
1996-06-30 1212.044951 2052.661928 2893.278906
1996-07-31 1597.967939 2438.584916 3279.201894
1996-08-31 1522.187178 2362.804155 3203.421132
In [466]:
# ✅ Plot Forecast vs Actual with Confidence Intervals

# Create a large, clear figure for presentation
plt.figure(figsize=(18,9))

# ✅ Plot the historical actual sales data
plt.plot(df['Sparkling'],
         label='Actual Data',
         linewidth=2)

# ✅ Plot the forecasted values from Triple Exponential Smoothing (TES)
plt.plot(pred_1_df['prediction'],
         label='TES Forecast (α = β = γ = 0.3)',
         linestyle='--',
         linewidth=2)

# ✅ Add 95% Confidence Interval shading around the forecast
# The shaded area represents uncertainty in future predictions
plt.fill_between(pred_1_df.index,
                 pred_1_df['lower_CI'],
                 pred_1_df['upper_CI'],
                 color='gray',
                 alpha=0.3,
                 label='95% Confidence Interval')

# ✅ Labels and formatting for readability and professional presentation
plt.title("Triple Exponential Smoothing (TES) Forecast with 95% Confidence Interval",
          fontsize=16)

plt.xlabel("Time", fontsize=14)
plt.ylabel("Sparkling Sales", fontsize=14)

plt.legend(loc='best', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout()

# ✅ Display the final plot
plt.show()
No description has been provided for this image

Observations:

  • The Holt-Winters Triple Exponential Smoothing model with parameters alpha = 0.50, beta = 0.30, gamma = 0.30 provides the best forecast accuracy, effectively capturing the overall level, upward trend, and strong seasonal spikes of sparkling wine sales.

  • The 95% confidence interval remains stable, suggesting low forecast uncertainty and a well fitted model.

Business Insights and Recommendations¶

Business Insights¶

  • Strong, consistent seasonality, shoulder lifts usually begin in October or November, with softer demand in May or August, sales peak every December.

  • Long term upward trend, underlying demand is rising as annual totals have generally increased over the period.

  • Demand that is right skewed, most months are in the middle range, with a few exceptionally high months driving a significant portion of volume.

  • Performance of the model: Holt Winters (TES) has the lowest RMSE and best captures trend and seasonality. ARIMA performs mediocrely while naive, average, and linear perform worse.

  • Residual diagnostics: Behaviour is fit by multiplicative seasonality (peaks increase with overall level).

Recommendations¶

  1. Planning for Inventory and Supplies
  • Create stock in September, peak in November, and decrease in January through February by implementing seasonal inventory profiles.

  • Establish monthly procurement and master production schedule (MPS) plans using TES forecasts.

  1. Marketing and Sales
  • Beginning in early November, launch holiday marketing campaigns (gifting, party packages) and continue them through the new year.

  • Use shoulder season promotions (May to August) to streamline usage, these could include value packs, small discounts, and cross selling with still wines.

  • Create limited edition seasonal SKUs (like "Holiday Sparkling Edition") to capitalise on December's elasticity.

  1. Revenue and Price Management
  • Utilise calendar based pricing, tactically support prices off peak and hold prices during peak weeks.

  • Use mix/bundle techniques (accessory/cheese pairings + sparkle) to increase basket size.

  1. Staffing and Operations
  • Scale labour and logistics for November and December, hire temporary warehouse workers, extend the time for picking and packing, and reserve space for carriers.

  • Place inventory ahead of time in regional DCs to cut lead time and keep stock levels high during peak times.

  1. Store and Channel
  • Get secure feature placements and end caps for the fourth quarter, and in October, negotiate guaranteed shelf facings.

  • For Direct to Consumer/e-commerce, let people place pre orders and promise to ship by a certain date for holiday delivery.

  1. Financial Planning & Analysis / Sales & Operations Planning
  • Every month, make a 12 to 18 month forecast and line it up with the S&OP cadence (Demand Supply Finance).

  • Check the accuracy of your forecasts by month (MAPE, WAPE) and service level, and make your goals more strict for Q4.

  1. Improvements to analytics (next iterations)
  • Make models for different types of wine (sparkling, red, white, rose) to get the best portfolio.

  • Add outside factors like holidays, sales, prices, weather, and general mood.

  • Do scenario planning (Base, Stretch, and Downside) to figure out how much capacity and cash you need.

KPIs to Monitor¶

  • Forecast Accuracy (MAPE/WAPE) by SKU and month

  • Fill Rate / OTIF (on time, in full) in November and December

  • Days with the most stockouts and backorders

  • Gross margin by channel and promotion Return on Investment

  • Days of Supply and Inventory Turns (especially in Q4)

  • Actions to Take Right Away (Next 30 to 60 Days)

  • Use the TES model as the production forecast for sparkling wine and other types of wine.

  • Make a plan for demand for the next 12 months and set aside space for Q4.

  • By the end of the third quarter, make sure the holiday promo calendar and store placements are set.

  • Set up a weekly peak season control tower for supply, sales, and logistics from November to December.


In [467]:
from google.colab import files
uploaded_files = files.upload()
Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable.
Saving Full_code_notebook_TSF_project_Joseph_Anthony_Tan.ipynb to Full_code_notebook_TSF_project_Joseph_Anthony_Tan.ipynb
In [468]:
if len(uploaded_files) == 1:
    notebook_name = next(iter(uploaded_files))
    print(f"Converting notebook {notebook_name} to HTML.")
    !pip install nbconvert
    !jupyter nbconvert "$notebook_name" --to html
    html_name = notebook_name.rsplit('.', 1)[0] + '.html'
    files.download(html_name)
else: # Corrected indentation for the else block
    print("Please upload exactly one .ipynb file.")
Converting notebook Full_code_notebook_TSF_project_Joseph_Anthony_Tan.ipynb to HTML.
Requirement already satisfied: nbconvert in /usr/local/lib/python3.12/dist-packages (7.16.6)
Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (4.13.5)
Requirement already satisfied: bleach!=5.0.0 in /usr/local/lib/python3.12/dist-packages (from bleach[css]!=5.0.0->nbconvert) (6.3.0)
Requirement already satisfied: defusedxml in /usr/local/lib/python3.12/dist-packages (from nbconvert) (0.7.1)
Requirement already satisfied: jinja2>=3.0 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (3.1.6)
Requirement already satisfied: jupyter-core>=4.7 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (5.9.1)
Requirement already satisfied: jupyterlab-pygments in /usr/local/lib/python3.12/dist-packages (from nbconvert) (0.3.0)
Requirement already satisfied: markupsafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (3.0.3)
Requirement already satisfied: mistune<4,>=2.0.3 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (3.1.4)
Requirement already satisfied: nbclient>=0.5.0 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (0.10.2)
Requirement already satisfied: nbformat>=5.7 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (5.10.4)
Requirement already satisfied: packaging in /usr/local/lib/python3.12/dist-packages (from nbconvert) (25.0)
Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (1.5.1)
Requirement already satisfied: pygments>=2.4.1 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (2.19.2)
Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.12/dist-packages (from nbconvert) (5.7.1)
Requirement already satisfied: webencodings in /usr/local/lib/python3.12/dist-packages (from bleach!=5.0.0->bleach[css]!=5.0.0->nbconvert) (0.5.1)
Requirement already satisfied: tinycss2<1.5,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from bleach[css]!=5.0.0->nbconvert) (1.4.0)
Requirement already satisfied: platformdirs>=2.5 in /usr/local/lib/python3.12/dist-packages (from jupyter-core>=4.7->nbconvert) (4.5.0)
Requirement already satisfied: jupyter-client>=6.1.12 in /usr/local/lib/python3.12/dist-packages (from nbclient>=0.5.0->nbconvert) (7.4.9)
Requirement already satisfied: fastjsonschema>=2.15 in /usr/local/lib/python3.12/dist-packages (from nbformat>=5.7->nbconvert) (2.21.2)
Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.12/dist-packages (from nbformat>=5.7->nbconvert) (4.25.1)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.12/dist-packages (from beautifulsoup4->nbconvert) (2.8)
Requirement already satisfied: typing-extensions>=4.0.0 in /usr/local/lib/python3.12/dist-packages (from beautifulsoup4->nbconvert) (4.15.0)
Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert) (25.4.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert) (2025.9.1)
Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert) (0.37.0)
Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert) (0.28.0)
Requirement already satisfied: entrypoints in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (0.4)
Requirement already satisfied: nest-asyncio>=1.5.4 in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (1.6.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (2.9.0.post0)
Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (26.2.1)
Requirement already satisfied: tornado>=6.2 in /usr/local/lib/python3.12/dist-packages (from jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (6.5.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.12->nbclient>=0.5.0->nbconvert) (1.17.0)
[NbConvertApp] Converting notebook Full_code_notebook_TSF_project_Joseph_Anthony_Tan.ipynb to html
[NbConvertApp] WARNING | Alternative text is missing on 26 image(s).
[NbConvertApp] Writing 4933838 bytes to Full_code_notebook_TSF_project_Joseph_Anthony_Tan.html